Deduplicating streaming data at scale

Here at Qubit we’ve been known to discuss customer data - quantitative and qualitative - and the value that comes from analysing it.

It’s pretty much our thing.

We’re delighted when brands use Qubit to dig into their data and uncover what makes their site visitors tick, deliver great personalizations and really targeted segmentation. We’re also hugely excited when we can share the results of our data to deliver insight that’s useful for the whole of the personalization space.

We enable over 400 million personalized experiences per month. Doing that means processing vast quantities of data from websites, customer touchpoints and other systems, in near-real time. There’s a whole world of creative problem-solving and technological wizardry beneath the surface, that enables user-friendly, reliable personalization delivery.

Jibran Saithi, Lead Architect at Qubit, has written a blog going into how we face one of the challenges of handling large scale data from multiple sources: duplication of data.

It may not sound sexy, but it is vital. We have to deduplicate data to make sure we’re getting all the information we need, with no unnecessary repeats that could throw off results and introduce inaccuracies, right down to whether a personalization worked and by how much.

To find out how we overcame this challenge, read Jibran’s blog.

 

Get our latest research