Back
Tech

How we talk to a billion records with our AI Navigator

Tim Kreutz
  • about 1 hour ago
  • 5 min read

Dataprovider's AI Navigator transforms how users interact with billions of web data records. Instead of requiring technical expertise to navigate 350 million domains and 200+ data variables, users can now ask questions in natural language and receive instant, accurate insights. Using a multi-agent AI architecture, the system handles everything from simple queries to complex, multi-level analyses—making web intelligence accessible to everyone while maintaining the powerful tools that data experts rely on.

This blog is an adaptation of our talk at aiGrunn2025: Interacting with a billion records using (semi-)autonomous agents.

The challenge of navigating a billion records

How do you make sense of 350 million domains, each with over 200 data variables, updated monthly? For data analysts and researchers, the sheer scale of web data presents a fundamental challenge: finding the specific insights you need without drowning in complexity. Traditional search interfaces require precise field knowledge, some understanding of boolean logic, and often multiple tries to get the right result.

At Dataprovider, we've been exploring a different approach. What if you could ask your questions in natural language and get accurate, data-driven answers directly? This thinking led us to develop the AI Navigator, a new conversational gateway to our proprietary data.

The AI Navigator handles easier ("How many websites from Groningen use WordPress?") and multi-level questions ("Which payment providers are most popular among e-commerce sites that sell gym equipment and how did this change over the course of the last year?") and receives immediate, actionable insights.

We now support in-answer graphs, extended reports and sharing your conversations within the same organization. And new features are being added almost weekly!

From queries to conversations

The traditional approach to database interaction requires users to understand field structures, query syntax, and data relationships. Our Search Engine interface, while powerful, demands a certain level of expertise to fully leverage its capabilities. Users need to know which of our 200+ variables to filter, how to combine them effectively, and how to interpret the results.

The AI Navigator leverages specialized AI agents to handle the technical complexity behind the scenes. When a user asks a question, multiple specialized agents work together to understand the intent, query an extensive knowledge base, generate the appropriate queries, and execute them against our database of over a billion records.

This conversational approach integrates with our existing tools, ensuring consistent insights across the board. Power users can still access the full Search Engine interface for tweaking their analyses, while business users can get quick answers and extensive reports through natural language queries.

The intelligence behind instant answers

The AI Navigator employs a multi-agent architecture, where different specialized agents handle specific aspects of the query process. Think of it as a team of experts, each with their own specialty, working together to answer your questions.

When you ask a question, a knowledge agent first searches our documentation and metadata to understand what data fields and relationships are relevant. This agent ensures that queries align with our actual data structure—for instance, knowing that "Groningen" maps to a region and a city, and “Wordpress” is a content management system.

A query generation agent then translates the question into the precise query language our systems understand. This agent validates field availability and ensures syntactic correctness before passing the query forward.

An orchestration agent then develops and executes a plan to answer your question. It might determine that your question requires multiple lookups, aggregations, or trend analyses. For complex questions, it can chain together multiple operations, pulling hostname data from our Traffic Index¹, getting an IP address and then doing a reverse DNS lookup² to map its server architecture.

¹ : Traffic Index  - Used to analyze website traffic patterns and popularity metrics for domains

² : Reverse DNS - Used to map an IP address back to a hostname or domain name.

Balancing automation with accuracy

One of the key design decisions in developing the AI Navigator was finding the right balance between autonomous operation and controlled accuracy. Pure automation might lead to creative but potentially incorrect interpretations, while rigid sequential processing could limit the system's ability to handle complex, multi-faceted questions.

We've implemented a hybrid approach. For straightforward queries with clear data mappings, the system operates with high autonomy, providing rapid responses. For more complex questions that might require interpretation or multiple data sources, the system follows more structured workflows to ensure accuracy.

Ongoing development based on real use cases

AI Navigator is a beta product, and at this stage we are eager to find out how it will be used. Getting to see real use cases has two purposes: it helps us gain an understanding of how to personalize its answers to users in the future, and it gives us a better handle on when and why it might not find the answer you are looking for. 

We are already very excited about its potential. Marketing teams might use it to quickly identify technology adoption trends in specific industries. Security professionals may ask it to "Show me all domains sharing the same SSL certificate as this suspicious site". Investment analysts can find value in AI Navigator’s ability to quickly validate market assumptions. They can ask about technology migrations, compare adoption rates across regions, or track changes in company digital footprints. Domain registries and registrars can quickly understand their namespaces better by asking questions about domain usage patterns, technology adoption within their TLD, or identification of potentially problematic registrations.

A timeline of conversational data access

This represents just the beginning of how conversational AI can transform access to structured web data. As we continue to refine the AI Navigator based on user feedback and usage patterns, we're exploring several enhancement areas.

Integration with our existing products—such as Recipes, Ownership tracking, and Know-Your-Customer datasets—will create even more powerful and personalised analytical capabilities.

We're also investigating ways to make the AI Navigator more proactive. Rather than just answering questions, it could suggest related insights, identify unusual patterns, or alert users to significant changes in their areas of interest. This evolution from reactive to proactive intelligence could fundamentally change how organizations monitor and understand the digital landscape.

The underlying multi-agent architecture provides flexibility for future expansion. New specialized agents can be added to handle emerging data types or analytical methods without disrupting the existing system. This modularity ensures the AI Navigator can evolve alongside our expanding tools and changing user needs.

Transforming complexity into clarity

The AI Navigator embodies our mission to make web intelligence accessible to everyone who needs it. By removing the technical barriers to data access, we're enabling more people within organizations to leverage web intelligence for decision-making.

Still, we don’t diminish the value of expertise. Our range of detail-oriented analysis tools keeps expanding and the volume and dimensionality of our records is ever-growing. As we continue to structure and index the global web, tools like the AI Navigator ensure this vast repository of information remains accessible, actionable, and valuable. We don’t want our database to be an overwhelming ocean of information but a way to get precise and quick insights when and how you need them.

For more insights and developments around web data intelligence, subscribe to our newsletter.

Subscribe to our newsletter to stay in the loop about the latest insights and developments around web data.

Subscribe