Entity Disambiguation (NER), Entity Linking, and Knowledge Graph Creation

Case Study
PROJECT MISSION: CONNECTING UNSTRUCTURED DATA

Entity disambiguation and entity linking on a corpus of historical data to connect it semantically in a knowledge graph.

Outcomes

1.2M

distinct entities extracted

9M

queryable realtionships

About The Project

This Rockefeller Foundation’s mission is to promote the well-being of humanity throughout the world through advances in science, data, policy, and innovation to solve global challenges related to health, food, power, and economic mobility. Since 1913, the Foundation has hosted numerous convenings and awarded thousands of grants to achieve its mission, thus resulting in a lot of unstructured data.  

The Foundation sought to connect their unstructured and structured data to external sources to extract and connect entities as input into a knowledge graph to produce insights, such as what people are at different events together? Who have we funded and where do they get their next round of funding from? 

The Foundation engaged Predictive UX to develop a Proof of Concept (POC) Natural Language Processing (NLP) solution for Named Entity Resolution (NER) and a graph database to enable insight discovery against internal and external sources of data. 

The hypothesis was that The Foundation could use the knowledge graph to gain intelligence about grants awarded issued by other organizations, grantees, convenings, and other events. 

To achieve the goals of the Foundation, we proposed an NER pipeline architecture, led data modeling and NER pipeline development, entity extraction, and knowledge graph implementation.

The NER pipeline allowed us to extract entities such as people, place, organization, funding amounts, and more. The pipeline included a step for Named Entity Disambiguation (NED) to associate names even when they present differently across contexts (e.g., Elizabeth Baker, Liz Baker, and Liz M. Baker) with certainty, thus reducing duplicate data.

This project resulted in 1.2M entities being extracted and 9M queryable relationships and contributed to the foundation’s long-term knowledge graph strategy.

Client

The US Chamber of Commerce T3 Innovation Network whose mission is to accelerate the use of digital tools to make the job market fairer and more inclusive.

What We Did

Led design and requirements refinement, collaborated with technical partners to align on decentralized storage, wallet integration, and credential rendering strategies.

Outcomes

A working POC with wallet-attached storage and a custom UI for linking verifiable credentials supported by a decentralized data model and resume rendering framework.

Delivery Time

12 weeks

 

 

 

Client

The Rockefeller Foundation advances the well-being of humanity by funding scientific, policy, and data-driven solutions to global challenges in health, food, energy, and economic opportunity. Since 1913, it has awarded tens of thousands of grants to individuals and organizations worldwide.

What We Did

We designed and implemented an NLP-powered pipeline to resolve entities and relationships across over 100,000 internal and external grant records. Our work included data preprocessing, spaCy pipelines, coreference resolution, weak labeling with Snorkel, and integration into a Neo4j graph database. 

Outcomes

Our pipeline extracted over 1.2 million distinct entities and identified more than 9 million relationships, unlocking new visibility into grantee activity and funding patterns. By unifying grant data through entity resolution, we enabled faster, more accurate insights for strategic decision-making. 

Data Goals

Selecting Better Residents

Each year, the Foundation connects unlikely people in residencies at the Bellagio Center in Lake Como to advance breakthrough solutions. They wanted to know how entity extraction and semantic data could this help them better select Residents.

Maximizing Connections

For more than a century, The Rockefeller Foundation has hosted convenings as a way for stakeholders and leaders to merge their visions and ideas into actions that change the world. They wanted to maximize connections during convenings and sought to understand how this approach to data could be useful, perhaps in providing people with knowledge about each other ahead of convenings.

Maximizing Connections

For more than a century, The Rockefeller Foundation has hosted convenings as a way for stakeholders and leaders to merge their visions and ideas into actions that change the world. They wanted to maximize connections during convenings and sought to understand how this approach to data could be useful, perhaps in providing people with knowledge about each other ahead of convenings.

Identifying Insights, Key Themes

Are there knowledge insights, connections, or themes that are missed during convenings? Could this enhance our ability to process what’s happening and synthesizing afterwards?

Our Approach

Entity Disambiguation and Knowledge Graph Ingestion Flow

This diagram shows the data pipeline designed during Predictive UX’s work with The Rockefeller Foundation. It outlines how raw documents are processed through a named entity recognition (NER) and coreference resolution pipeline, followed by topic and relationship extraction, before being integrated into a structured knowledge graph. The flow visualizes the architecture of the proof-of-concept used to link people, organizations, and topics across unstructured grant and news data.

A diagram showing how documents are processed through named entity recognition, coreference resolution, topic extraction, and relationship models to build a knowledge graph in Neo4j. The process includes data cleaning, entity linking, and outputs both entity relationships and unified records.

Time to Proof

Reduces time-to-proof by enabling users to link verifiable credentials (VCs) directly to resume content, replacing vague claims with validated evidence.

Increased Trust

Increases trust by anchoring self-asserted skills and experiences to endorsements, digital signatures, and credential metadata - proving resumes are human, not AI.

Equitable

Empowers underserved job seekers, including career-switchers and those without traditional degrees, to represent their skills with credibility and control.

Data Sovereignty/Portabilty

Enables portable, user-owned resumes that live beyond a single job platform—supporting decentralized storage and selective sharing for data sovereignty.

AI in Healthcare
AI-Powered Search