Tech Wavo
  • Home
  • Technology
  • Computers
  • Gadgets
  • Mobile
  • Apps
  • News
  • Financial
  • Stock
Tech Wavo
No Result
View All Result

New project makes Wikipedia data more accessible to AI

Tech Wavo by Tech Wavo
October 1, 2025
in Computers
0


On Wednesday, Wikimedia Deutschland announced a new database that will make Wikipedia’s wealth of knowledge more accessible to AI models.

Called the Wikidata Embedding Project, the system applies a vector-based semantic search — a technique that helps computers understand the meaning and relationships between words — to the existing data on Wikipedia and its sister platforms, consisting of nearly 120 million entries.

Combined with new support for the Model Context Protocol (MCP), a standard that helps AI systems communicate with data sources, the project makes the data more accessible to natural language queries from LLMs.

The project was undertaken by Wikimedia’s German branch in collaboration with the neural search company Jina.AI and DataStax, a real-time training-data company owned by IBM.

Wikidata has offered machine-readable data from Wikimedia properties for years, but the pre-existing tools only allowed for keyword searches and SPARQL queries, a specialized query language. The new system will work better with retrieval-augmented generation (RAG) systems that allow AI models to pull in external information, giving developers a chance to ground their models in knowledge verified by Wikipedia editors.

The data is also structured to provide crucial semantic context. Querying the database for the word “scientist,” for instance, will produce lists of prominent nuclear scientists as well as scientists who worked at Bell Labs. There are also translations of the word “scientist” into different languages, a Wikimedia-cleared image of scientists at work, and extrapolations to related concepts like “researcher” and “scholar.”

The database is publicly accessible on Toolforge. Wikidata is also hosting a webinar for interested developers on October 9th.

Techcrunch event

San Francisco
|
October 27-29, 2025

The new project comes as AI developers are scrambling for high-quality data sources that can be used to fine-tune models. The training systems themselves have become more sophisticated — often assembled as complex training environments rather than simple datasets — but they still require closely curated data to function well. For deployments that require high accuracy, the need for reliable data is particularly urgent, and while some might look down on Wikipedia, its data is significantly more fact-oriented than catchall datasets like the Common Crawl, which is a massive collection of web pages scraped from across the internet.

In some cases, the push for high-quality data can have expensive consequences for AI labs. In August, Anthropic offered to settle a lawsuit with a group of authors whose works had been used as training material, by agreeing to pay $1.5 billion to end any claims of wrongdoing.

In a statement to the press, Wikidata AI project manager Philippe Saadé emphasized his project’s independence from major AI labs or large tech companies. “This Embedding Project launch shows that powerful AI doesn’t have to be controlled by a handful of companies,” Saadé told reporters. “It can be open, collaborative, and built to serve everyone.”

Previous Post

How burnout broke this Singaporean cafe owner’s dream

Next Post

Norton VPN has just reinforced its commitment to privacy – here’s why it matters

Next Post
Norton VPN has just reinforced its commitment to privacy – here’s why it matters

Norton VPN has just reinforced its commitment to privacy – here's why it matters

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Apple removes ICEBlock from the App Store after Trump administration’s demand

by Tech Wavo
October 3, 2025
0
Apple removes ICEBlock from the App Store after Trump administration’s demand
Computers

Apple has removed ICEBlock, the app which allowed users to put a pin on a map to show where ICE...

Read more

All battery Ring Doorbell models plummet to record-low prices!

by Tech Wavo
October 3, 2025
0
All battery Ring Doorbell models plummet to record-low prices!
Mobile

All of these offers are available from Amazon, which owns the Ring brand. The good news is that they are...

Read more

Senator Cassidy Takes To ExTwitter, Radio To Rail Against RFK Jr., Whom He Voted To Confirm

by Tech Wavo
October 3, 2025
0
This Week In Techdirt History: August 3rd – 9th
Technology

from the you-did-this dept I’ll start with this: I am certainly not fully politically aligned with Senator Bill Cassidy, but...

Read more

General Dynamics Stock Hits All-Time High: More to Come?

by Tech Wavo
October 3, 2025
0
General Dynamics Stock Hits All-Time High: More to Come?
Financial

General Dynamics TodayGDGeneral Dynamics$340.91 +0.16 (+0.05%) As of 03:59 PM Eastern This is a fair market value price provided by Polygon.io....

Read more

Site links

  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of use
  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of use

No Result
View All Result
  • Home
  • Technology
  • Computers
  • Gadgets
  • Mobile
  • Apps
  • News
  • Financial
  • Stock