Entity Disambiguation (NER), Entity Linking, and Knowledge Graph Creation
Entity disambiguation and entity linking on a corpus of historical data to connect it semantically in a knowledge graph.
Outcomes
1.2M
distinct entities extracted
9M
queryable realtionships
About The Project
This Rockefeller Foundation’s mission is to promote the well-being of humanity throughout the world through advances in science, data, policy, and innovation to solve global challenges related to health, food, power, and economic mobility. Since 1913, the Foundation has hosted numerous convenings and awarded thousands of grants to achieve its mission, thus resulting in a lot of unstructured data.
The Foundation sought to connect their unstructured and structured data to external sources to extract and connect entities as input into a knowledge graph to produce insights, such as what people are at different events together? Who have we funded and where do they get their next round of funding from?
The Foundation engaged Predictive UX to develop a Proof of Concept (POC) Natural Language Processing (NLP) solution for Named Entity Resolution (NER) and a graph database to enable insight discovery against internal and external sources of data.
The hypothesis was that The Foundation could use the knowledge graph to gain intelligence about grants awarded issued by other organizations, grantees, convenings, and other events.
To achieve the goals of the Foundation, we proposed an NER pipeline architecture, led data modeling and NER pipeline development, entity extraction, and knowledge graph implementation.
The NER pipeline allowed us to extract entities such as people, place, organization, funding amounts, and more. The pipeline included a step for Named Entity Disambiguation (NED) to associate names even when they present differently across contexts (e.g., Elizabeth Baker, Liz Baker, and Liz M. Baker) with certainty, thus reducing duplicate data.
This project resulted in 1.2M entities being extracted and 9M queryable relationships and contributed to the foundation’s long-term knowledge graph strategy.
Client
The US Chamber of Commerce T3 Innovation Network whose mission is to accelerate the use of digital tools to make the job market fairer and more inclusive.
What We Did
Led design and requirements refinement, collaborated with technical partners to align on decentralized storage, wallet integration, and credential rendering strategies.
Outcomes
A working POC with wallet-attached storage and a custom UI for linking verifiable credentials supported by a decentralized data model and resume rendering framework.
Delivery Time
12 weeks
Client
The Rockefeller Foundation advances the well-being of humanity by funding scientific, policy, and data-driven solutions to global challenges in health, food, energy, and economic opportunity. Since 1913, it has awarded tens of thousands of grants to individuals and organizations worldwide.
What We Did
We designed and implemented an NLP-powered pipeline to resolve entities and relationships across over 100,000 internal and external grant records. Our work included data preprocessing, spaCy pipelines, coreference resolution, weak labeling with Snorkel, and integration into a Neo4j graph database.
Outcomes
Our pipeline extracted over 1.2 million distinct entities and identified more than 9 million relationships, unlocking new visibility into grantee activity and funding patterns. By unifying grant data through entity resolution, we enabled faster, more accurate insights for strategic decision-making.
Data Goals
Selecting Better Residents
Maximizing Connections
Maximizing Connections
Identifying Insights, Key Themes
Our Approach
Entity Disambiguation and Knowledge Graph Ingestion Flow
This diagram shows the data pipeline designed during Predictive UX’s work with The Rockefeller Foundation. It outlines how raw documents are processed through a named entity recognition (NER) and coreference resolution pipeline, followed by topic and relationship extraction, before being integrated into a structured knowledge graph. The flow visualizes the architecture of the proof-of-concept used to link people, organizations, and topics across unstructured grant and news data.
Time to Proof
Reduces time-to-proof by enabling users to link verifiable credentials (VCs) directly to resume content, replacing vague claims with validated evidence.
Increased Trust
Increases trust by anchoring self-asserted skills and experiences to endorsements, digital signatures, and credential metadata - proving resumes are human, not AI.
Equitable
Empowers underserved job seekers, including career-switchers and those without traditional degrees, to represent their skills with credibility and control.
Data Sovereignty/Portabilty
Enables portable, user-owned resumes that live beyond a single job platform—supporting decentralized storage and selective sharing for data sovereignty.