Right, but what IS Search-Based Data Discovery?

September 1, 2014

By Chris von Csefalvay


I’m sure I’m not alone with dreading the question ‘what is it you do?’. It’s difficult enough to explain what systems architects do – but then to explain that I work in ‘search-based data discovery’ (SBDD) is almost a guarantee to get blank stares. I might be tempted to slightly fib and say I work in Big Data, but of course that immediately evokes either the image of dealing with huge spreadsheets or massive OLAP cubes, which is of course not quite what SBDD is, or some Orwellian nightmare, which it is even less. And it’s not the audience’s fault, either – SBDD is a newcomer to the Big Data analytics field and in many ways, a term crying out for a definition.

The term originated with US-based IT consultancy Gartner, who define it as tools that

enable users to develop and refine views and analyses of structured and unstructured data using search terms. Like visualization-based data discovery tools, they have three attributes: (1) a proprietary data structure to store and model data gathered from disparate sources, which minimizes reliance on predefined business intelligence (BI) metadata; (2) a built-in performance layer using RAM or indexing that lessens the need for aggregates, summaries and pre-calculations; and (3) an intuitive interface, enabling users to explore data without much training. However, as well as having a broader scope (visualization-driven data discovery tools focus exclusively on quantitative data) they differ at the user interface layer, with search-based data discovery tools using text search input and results to guide users to the information they need.

That, I suppose, is as good a definition as any, but it’s not exactly something that rolls off the tongue, never mind that by the time you’re through points (2) or (3), the dinner party is over and your date has fled for the hills. Clearly, we need something handier. Here’s one we’ve come up with at Squirro, where SBDD is what we eat, drink, live, breathe and bleed.

We’re building noise-cancelling headphones for your data.

SBDD is the clever little tool that lets you enjoy Beethoven’s 5th (or the latest set from Hudson Mohawke) on a plane full of screaming infants, a stag party and some rambunctious school kids on a class trip (and a massive sugar rush). Once you decided what it is you care about (that is, to stick with the analogy, selected your song), SBBD can help you discover what data exists out there. SBBD might have ‘search’ at the start, but it is as far from Google as it gets. Rather, SBBD lets you find things you did not know you did not know. By filtering out the noise from what you know and don’t care about, SBBD helps you focus on the possible ‘unknown unknowns’.

This is not a particularly difficult exercise when it comes to structured data. It gets problematic when your query meets the messy reality of people using different words for the same concept, mistyping words and plenty of other issues. Let’s face it – humans are messy. And searching messy data is challenging. And that’s exactly where SBBD shines. Unlike Search-based Data Retrieval (aka ‘Google’), SBDD lets you figure out where you want to search as you search – in an iterative process of refining, slicing and dicing, you find the data and the correlations you need.

If you are looking for a brilliant pair of noise-cancelling headphones, I recommend these. If you’re looking for noise-cancelling headphones for your data, I suggest you check out Squirro.