Explore Gadget Wave's Latest Innovations — Headline: Gadget Wave's Cloud Computing Guide

Refining Patent Searches via Machine Learning Models

Google Develops Phrase Database for Patent Search Models Enhancement

, and Administrator

2025 July 30 . 10:20 AM

2 min read

Refining Artificial Intelligence Algorithms for Patent Research

Refining Patent Searches via Machine Learning Models

In the realm of patent search, having a comprehensive and accurate dataset is crucial for effective model training. However, Google does not provide a publicly available dataset of phrases specifically designed for training patent search models.

For those seeking bulk patent data, the popular choice is to use third-party tools or open-source scripts, such as those found on GitHub, which automate the downloading of multiple patent documents from Google Patents or databases like Espacenet. These scripts mimic user actions using browser automation tools like Selenium with ChromeDriver to download associated patent PDFs in batches.

While these approaches don't provide a "dataset of phrases," they do allow for the collection of large volumes of patent text, essential for further processing. For instance, the European Patent Office's Espacenet offers free access to over 130 million worldwide patent documents, providing a substantial resource for those in the patent search field.

To create a dataset of phrases or queries, the typical approach is to collect patent full texts in bulk, extract phrases or terms of interest using natural language processing (NLP) techniques, and curate your own dataset from the raw patent texts.

It's important to note that many patent owners use non-standard language to describe their patents' subjects, which can result in widely varied and impractical search returns. To counter this, the dataset under consideration contains approximately 50,000 phrase-to-phrase pairs, each labelled to denote how phrases are related to one another, with relationship labels including synonyms, exact matches, and unrelated.

The image used in this article is a user-submitted image from Flickr, courtesy of Nick Normal, and is not part of the patent search dataset, nor does it serve as a search tool for patents. Instead, it serves to illustrate the diverse and creative language used in patents, with examples of non-standard language such as describing a soccer ball as a spherical recreation device.

In conclusion, while Google does not offer a ready-made dataset for patent search training, the practical approach for building such datasets involves collecting patent texts in bulk, extracting phrases of interest, and curating your own dataset. This approach, combined with the use of large patent databases like Espacenet, provides a valuable resource for those looking to improve the accuracy of patent search models.

In the process of constructing a dataset for patent search models, one can utilize data-and-cloud-computing technology, such as third-party tools, open-source scripts, Selenium with ChromeDriver, to collect patent full texts in bulk. Subsequently, artificial intelligence (AI) techniques, specifically natural language processing (NLP), can be employed to extract phrases of interest and create a personalized dataset.

With the use of such AI and large-scale patent databases like data from Espacenet, it becomes possible to build a dataset containing approximately 50,000 phrase-to-phrase pairs, aiding in improving the accuracy of patent search models.

Latest

Manufacturing

HMS Astute Returns for Major Overhaul After 15 Years of Global Service

HMS Astute, the first of its class to achieve numerous milestones, is back for a well-deserved refit. The multi-million-pound Mid-Life Revalidation Period will secure the submarine's future and reflect the Royal Navy's commitment to a strong underwater fleet.

, and Administrator

2025 October 9

In the center of the image we can see a man riding on the jet ski. At the bottom there is water. In...

Latest Tech Innovations

Salomon's Speedcross Peak Waterproof Sneaker: Fall 2025's Must-Have

Stay dry and stylish this fall with Salomon's latest. The Speedcross Peak Waterproof sneaker combines performance and fashion at a Prime Day discount.

, and Administrator

2025 October 9

In this picture there is a security person who is holding the papers. In front of him there is...

Fortify Your Gadget World

Rubrik Bolsters Leadership with Top Appointments, Surpasses $400M in ARR

Rubrik strengthens its leadership with high-profile appointments. With over $400M in ARR, it's poised to drive innovation in cybersecurity, especially in the APAC region.

, and Administrator

2025 October 9

This image consists of few persons. They are wearing the army dresses. At the bottom, there is...

Smart-home-devices

Wesel Police Offers Free E-bike & Pedelec Training & Coding This Fall

Boost your riding skills and security with free police-led training and coding for your E-bike or Pedelec. Sessions happening across Wesel this October.

, and Administrator

2025 October 9

Refining Patent Searches via Machine Learning Models

Refining Patent Searches via Machine Learning Models

Read also:

Related

Latest