This article explains how to use the PDF keyword find and replace feature in office software to batch delete non-fixed text such as dates, years, and months in multiple PDF files through wildcards or formula fuzzy matching. In the example, four PDFs from 1.pdf to 4.pdf need to be processed. The original files contain content such as April and 2017. After processing, these matched keywords are deleted, leaving only the text that does not need to be cleaned up. This is suitable for batch cleaning sensitive information and repetitive fields in reports, contracts, and document packages.
When organizing PDF reports, contracts, audit materials, or documents for external release, a troublesome issue often arises: the text to be deleted is not completely fixed. For example, some PDFs contain "April 13, 2017," while others say "May 20, 2018," and yet other files have different years, months, or numbers. If you open each PDF one by one to manually find and delete them, it's not only time-consuming but also easy to miss. This article addresses this type of problem: using wildcards or formula-based fuzzy search to batch delete keywords in multiple PDF files.
As seen in the screenshot, the software used is " HeSoft Doc Batch Tool ". It is a document batch processing software designed for office scenarios. Its core value lies in consolidating repetitive document processing actions into a single workflow. For keyword cleanup in PDF files, it offers the "Find and Replace Keywords in PDF" function. By adding multiple PDFs to the task list, setting the keyword search rules, and leaving the replacement content blank, you can batch delete PDF text content.
Applicable Scenarios: Which PDFs are suitable for batch keyword deletion using wildcards
This need for batch deleting PDF keywords is very common in daily office work. For example, when a company needs to send out a batch of PDF reports externally and must delete the month and year from the report date; when a legal department needs to clean client names, ID numbers, reference numbers, or amounts from contract PDFs; when administrative staff need to uniformly delete old dates from multiple PDF notices; or when archiving materials, version numbers, project codes, or batch numbers repeatedly appearing in file text must be removed.
If the text to be deleted is identical, ordinary exact find-and-replace can handle it. But when the keywords vary, fuzzy search is needed. For instance, the month might be April or May, and the year could be four-digit numbers like 2017, 2018, 2026. In such cases, notations similar to wildcards, formulas, or regular expressions allow the software to match a type of text, rather than just one fixed term. The screenshot uses "Use formula for fuzzy text search" and enters April|May and \d{4} in the keyword list to match April or May, and four-digit years.
It is important to note that the goal here is to delete text keywords within the PDF, not to delete entire pages or the PDF file itself. The software locates matching text in the PDF content based on the search rules and performs the replacement as configured; when the replacement keyword list is empty, it effectively deletes the matched content.
Effect Preview: Before processing, multiple PDFs contain keywords needing cleanup in their body text
Before processing, there are 4 PDF files in the folder: 1.pdf, 2.pdf, 3.pdf, and 4.pdf. For such multi-file tasks, if you manually open each one and search for content like April, May, and years individually, the workload increases rapidly with the number of files.

After opening one PDF, you can see a date entry in the page: April 13, 2017. The screenshot highlights April and 2017 with red boxes and uses arrows to point to the positions needing processing. The processing goal here is not to delete the entire date segment but to use fuzzy rules to delete the month and year, preserving content like "13," that doesn't need deletion.

This example effectively illustrates the value of "batch fuzzy deletion of PDF keywords using wildcards." Since the month and year may differ across PDFs, entering only "April" or "2017" would only delete fixed text; however, using formula-based fuzzy search can cover April, May, and any four-digit year in one go, making it suitable for more similar files.
Effect Preview: After processing, matched months and years have been deleted
After processing is complete, opening the PDF to check the result shows that the position where "April" was originally displayed is now blank, and the position where "2017" was originally displayed is also blank, while the intervening "13," remains. This indicates the software only deleted the content matching the rule, without clearing the entire page content or other text.

Judging from the processing effect, batch fuzzy deletion of PDF text is suitable for text cleanup with clear rules. For example, deleting English months, four-digit years, fixed-format numbers, a category of sensitive words, etc. As long as the keyword rules are set accurately, it can significantly reduce the repetitive manual work of opening PDFs, finding, editing, and saving.
Operation Step 1: Access PDF tools and select "Find and Replace Keywords in PDF"
After launching HeSoft Doc Batch Tool , select "PDF Tools" in the left function category. The main area will display multiple PDF-related batch processing functions. According to the screenshot, the function to use this time is the first item, "Find and Replace Keywords in PDF," with the description "Batch find and replace keywords in PDF file content."

The purpose of this step is to enter the functional module specifically for processing text keywords in PDFs. It differs from functions like PDF watermarking, adding PDF passwords, or converting PDF to Word, focusing instead on finding and replacing text within PDF content. Since our goal is to delete keywords, we will later leave the "replacement keyword list" blank, so the matched content is replaced with nothing.
Operation Step 2: Add the PDF files for batch processing
After entering the function, you can see buttons like "Add files," "Import files from folder," "Clear," and "More" at the top of the page. Step 1 of the task flow is "Select records to process." If there are only a few PDF files, you can click "Add files" to select them one by one; if the files are in a single folder, you can use "Import files from folder" to add multiple PDFs at once.

In the screenshot, 4 records have been added, named 1.pdf, 2.pdf, 3.pdf, and 4.pdf, with the pdf extension, located in the D drive test directory. The list also shows creation time, modification time, and an action column. After confirming the records are correct, click "Next" at the bottom to proceed to processing option settings.
The expected outcome of this step is: all PDFs needing keyword cleanup appear in the list, with the count, file names, and paths matching expectations. Before batch processing, it is recommended to double-check the file list to avoid adding PDFs that don't need processing. If files are added by mistake, you can remove them using the delete icon in the action column or use "Clear" to reselect.
Operation Step 3: Choose formula-based fuzzy search and enter the keyword rules to delete
Upon entering Step 2 "Set Processing Options," you will see "Set Keyword Options." In the "Search Method," the interface provides "Exact Text Search" and "Use formula for fuzzy text search." Since the content to be deleted varies in this example, select "Use formula for fuzzy text search."

In the "Keywords to find (Left List)," the screenshot shows two lines of rules entered: the first line is April|May, and the second line is \d{4}. April|May matches either "April" or "May"; \d{4} matches any sequence of four consecutive digits, commonly used for years like 2017, 2018, 2026, etc. With these two rules, English months and four-digit years in different PDFs can be found together.
On the right is the "Keywords after replacement (Right List)," with the prompt "Leave blank to delete." Therefore, if the goal is to batch delete these keywords from PDFs, no replacement text needs to be entered in the right list; leaving it empty is sufficient. The software will then replace the content matched from the left list with blank, achieving the deletion effect.
This step is crucial. It is recommended to test the rules' correctness on a small sample of files first. For instance, process just 1 PDF, confirm that April, May, and four-digit years are accurately deleted, and then execute the batch processing for the entire folder. For more complex content, such as serial numbers, dates, phone numbers, or contract numbers, corresponding fuzzy search rules can also be written based on the text pattern.
Operation Step 4: Continue by setting the save location and starting processing
After completing the processing option settings, click "Next" at the bottom of the page. As seen in the process flow bar, subsequent steps include "Set Save Location" and "Start Processing." Although the screenshot did not expand the details of these two pages, based on the interface flow, it is reasonable to infer that the next step requires confirming where the processed PDFs will be saved, before moving to the "Start Processing" phase.
It is advisable not to directly overwrite important original files, especially when using wildcards or formula rules for the first time. A safer approach is to save the processed PDFs in a new folder. After processing is complete, open a few files to spot-check the results, ensuring the keywords are deleted and the body text is not erroneously removed, before using them for formal archiving or external distribution.
When the task begins, the software will perform the find and replace on each PDF in the list sequentially. Compared to manually opening 4, 40, or even 400 PDFs to delete keywords one by one, the advantage of batch processing is obvious: the rules only need to be set once, and the software automatically applies them to all files.
Common Questions and Notes
1. Why can the replacement keyword list be left blank? As the prompt in the screenshot indicates, "Leave blank to delete." Therefore, when the right replacement list is empty, the software replaces the found content with empty content, achieving the effect of deleting PDF keywords.
2. What is the difference between exact search and formula-based fuzzy search? Exact search is suitable for deleting identical text, such as a fixed company name, fixed watermark text, or a fixed project name. Formula-based fuzzy search is suitable for deleting content that follows a pattern but is not exactly the same, such as different years, different months, continuous digits, or number fragments.
3. Will \d{4} delete all four-digit numbers? It will match any sequence conforming to the rule of four digits. Therefore, if the PDF contains four-digit numbers you do not wish to delete, use this rule cautiously and test it first. The broader the rule, the higher the risk of accidental deletion; the more precise the rule, the more controllable the processing result.
4. Can scanned PDFs be processed? If the PDF pages are essentially images with no selectable or copyable text layer, ordinary text search and replace might fail to match. Such files usually need OCR recognition first, then processing can be attempted based on the actual text layer situation.
5. Should I back up before batch processing? Backups are recommended. When batch modifying files, it's best to keep the original PDFs or save the output to a new directory. This allows for a quick rollback even if the rules are not set perfectly.
Summary: Replace repetitive manual PDF opening with a single set of rules
With the "Find and Replace Keywords in PDF" function of HeSoft Doc Batch Tool , the previously repetitive and inefficient work of PDF keyword deletion can be turned into a standardized process. In the example shown in this article, we first imported 1.pdf to 4.pdf, then selected formula-based fuzzy search, entered April|May and \d{4}, left the replacement keyword list blank, and finally achieved the batch deletion of months and four-digit years in PDFs.
If you frequently need to clean dates, numbers, sensitive words, or other patterned text from PDF reports, contracts, archives, or information packs, you can follow the steps in this article, test with a few files first, and then scale up to batch file processing. This not only reduces repetitive labor but also lowers the probability of manual omission, making PDF batch processing more efficient and controllable.