One To Watch #9: Nuclia aims to transform unstructured data search

Search video, text, audio, PDFs...

One To Watch #9: Nuclia aims to transform unstructured data search

“Our goal is not to be the 'best' search vector search engine in the world. There are a lot of vector search engines on the market right now – we want to be the most useful one; one that really fixes a problem,” says Ramon Navarro, CTO of Barcelona-based Nuclia. Together with fellow co-founder and CEO Eudald Camprubí, Navarro is building an AI-powered search engine and database designed for unstructured data that enables multi-lingual, multi-format searches of not just information, but "concepts" using Natural Language Processing (NLP). This week Nuclia clinched a $5.4 million seed round from Crane Venture Partners and Ealai, and open sourced their novel cloud native database, NucliaDB (albeit cautiously under the strong copyleft GPLv3 licence.)

We're making Nuclia The Stack's latest "One to Watch" -- a startup that really excites us for its potential.

The company is shipping an end-to-end Nuclei API --  capable of connecting to any data source and automatically indexing its content, regardless of format or language -- that lets developers build AI-powered search functions onto unstructured data. Underpinning this is the freely available NucliaDB database, which lets users deploy their own vectorisation and normalisation algorithms while providing storage, indexing, and querying.

(Commercially, the idea is that users will use NucleiDB to store all their unstructured data; pay for the API and also, should they be inclined, pay for NucliaDB-as-a-Service, hosted on multicloud infrastructure.)

“Where the data is, what format it is, and in what language – it's a nightmare for most companies when they are trying to index and to access that information,” says Camprubí, pointing out that 80-90% of any organisation’s data is unstructured, and spread across different sources: "We connect those data sources, we index the information, if it's a video, we transcribe the video. And we then run our algorithms to extract first of all, named entities. So we automatically detect names of people names of organisations, dates, quantities, a lot of different things – then we index all the text, we index all the paragraphs,” notes the Nuclia CEO on a call this week.

He adds: "And then once we have everything we vectorise the information, so meaning that we turn text into vectors, into numbers. We store everything in NucliaDB, which is the database that we have created, which offers vector search together with text search, paragraph search and fuzzy search..."

(NucleiDB is written in Rust and Python and built on the Tantivy library. It's designed to be run on Kubernetes, with eventual consistency transactions based on the Nats.io architecture and TiKV and Redis support).

Nuclia CEO Eudald Camprubí (l) and CTO Ramon Navarro (r).
Nuclia CEO Eudald Camprubí (l) and CTO Ramon Navarro (r).

See also: One to Watch: WCKD RZR

There's a growing demand for access to such unstructured organisational data and Nuclia wants organisations to build their own search engines and other applications on top of their technology.

“I was always missing a tool that helped me with search. Search is a super-complex problem; nobody imagines how complex is it until you need to build a search engine,” says Navarro, adding: “We don't know what kind of software is going to be built on top of Nuclia, we cannot imagine how far people will go.

"But the more freedom you can give to developers or to companies to build their own system, it's going to be better – I'm not going to know and understand the problems of clients better than them.”

Nuclia's mixed open/proprietary model

Commercially, Nuclia’s “understanding API” will remain proprietary.

“People will pay by consumption of [for] this understanding API and the training API.... So whatever information you put in the database is going to be useful for training accurate models and to create more information specifically for the database,” says Navarro, emphasising that having Nuclia’s code open source will make it easier to find and attract talented developers; either as employees or users. (Nuclia won't retain user data which is processed by its API, only statistics for accounting and customers can use NucliaDB without making use of the API.)

That transparency open source gives is critical, they say: “It's not easy to find ex-Google search engineers, for example. Or it's not easy to find people who really have a lot of experience in this world of searchability, and that can bring the value that we require this at this moment. So for us, it’s good to show that we are able to do it, and people can trust us – because we are a tool for developers. It's difficult to trust it, if you don't see what's behind it."

Nuclia gains $5.4m seed funding

“Nuclia has built something incredible. Imagine being taken to the exact time in a video or podcast, or the exact block in a PDF or presentation, that has the content you are looking for. And then go a step further, searching not only for content, but also concepts,” says Aneel Lakhani, venture partner at Crane -- an early stage VC fund , in a press release, adding: "“We believe the explosion of unstructured data like audio and video will only continue.

"Nuclia is poised to underpin how engineers build search into their apps and services and how modern businesses unleash insight from unstructured data that simply isn’t accessible today.”

The scale of what Nuclia is trying to accomplish is impressive – and indeed, when discussing the firm’s long-term ambitions CEO Camprubí references both Elastic and Algolia as lodestars.

“Our efforts are not to build something super smart. It's to do something super useful. And sometimes in technology this is difficult, because you get too enthusiastic about the technology that you end up building something super complex to be used by developers,” says Navarro: “We don't want to crack the wall, we are not aiming to fix everything. We want to have a tool so people with words, with the way they are expressing, can find any material or information that's in their knowledge – even if it's spoken, even if it is written on an image, or even if it's a PDF or Word document or whatever. And we are super-focused on this problem, on trying to nail this.”

Investing in developers - and users

Nuclia will be using its new investment to expand the team.

Arguably unusually, Camprubí says in addition to traditional domain-expert developers with search experience, Nuclia is also looking to attract “citizen developers” who might not have experience in coding, but are building systems and applications using low-code or no-code tools: “Our [focus in] coming months, besides offering this technology to pure developers and stabilising everything, is also to start approaching these other [citizen] developers which are a huge and growing market, and also offering Nuclia to them,” he says.

While Nuclia is a completely remote organisation in terms of hiring talent, the company still has a strong sense of place – both in terms of its tech community and its origins in Barcelona.

Navarro makes clear how vital the founders’ experience has been in creating Nuclia: “We come from a large internet e-discovery world, building super-large search engines with Elastic, and a lot of different search engines – that gave us all the knowledge and all the experience to understand what are the pains of this system.”

Nuclia is the third company Camprubí and Navarro, who have known each other since childhood, have collaborated on, after founding Iskra.cat, a technology agency, together 11 years ago, before both working at Onna.com as COO and CTO respectively during its initial years: "It's the best partnership, because I'm super-technological, and Eudald is super-good at business and marketing. Because explaining what vector search is, it's something that looks simple when you know - but when you don't know what it is, it's really complex," says Navarro.

Flying the Barcelona flag

Camprubí also makes clear how proud they are of their Barcelonan heritage, and the fact Nuclia is able to fly the flag for Barcelona tech start-ups. He’s keen to show that local start-ups can contribute something truly unique – something which is not currently happening: "[In Spain] we are missing a bit the concept of inventing, of writing a white paper to invent something new, which is what we are trying. And that's why for us, it's good for being from Barcelona, because inventing wheels from Barcelona is not that easy. We are not copying anything from outside.

"But the challenge is not only within the local ecosystem, but also in the mindset of European buyers. Camprubí says too often he sees organisations drawn to software developed outside of their own countries or regions and overlooking homegrown innovations: “'American software is always better than European software'. This is the belief of a lot of administrations. I hope that projects, like ours [and other] deep technology developed in Barcelona, will help change this perception,” he concludes.

Follow The Stack on LinkedIn