When multiple PDF files contain dates, years, numbers, or sensitive words that need to be deleted, manually opening each file to find and remove them is very time-consuming. This article uses HeSoft Doc Batch Tool as an example to demonstrate how to use the "Find and Replace Keywords in PDF" feature to batch match and delete content like April, May, and four-digit years across multiple PDFs by using formulas for fuzzy text search. The article combines before-and-after results with the software interface to explain the complete process of adding PDFs, setting wildcard rules, replacing with empty to delete, and continuing to save and process, suitable for office scenarios that require batch cleaning of PDF text content.
In daily office work, PDF files are often used to archive contracts, reports, audit materials, notification documents, or public documents. If certain text that needs to be cleaned up appears repeatedly in these PDFs, such as months in dates, four-digit years, fixed numbers, batch numbers, project codes, or some sensitive information, manually opening and deleting them from each file would be very inefficient. Especially when there are many files and each PDF has many pages, manual searching is not only time-consuming but also prone to missed deletions.
The problem this article addresses is: how to use wildcards or formulas for fuzzy matching to batch delete keywords from multiple PDF files. Taking the office software " HeSoft Doc Batch Tool " as an example, through its "Find and Replace Keywords in PDF" function within its PDF tools, content matching the rules in multiple PDFs is found, and the replacement content is left blank, thus achieving batch deletion. In the example, what needs to be deleted are the English month and the four-digit year in the PDF cover date, for example, deleting "April" and "2017" from "April 13, 2017", leaving only the middle "13,".
Applicable Scenarios: What PDF Content is Suitable for Batch Fuzzy Deletion with Wildcards
Batch deletion of PDF keywords using wildcards is suitable for processing text content that has certain patterns but is not entirely fixed. Compared to deleting only an identical word, fuzzy searching is better suited for situations where "content is similar but parts vary." For example, in multiple PDFs, some dates might be April 13, 2017, others might be May 10, 2018. If you input complete dates one by one, you would need many rules; but with formula-based fuzzy searching, you can match a category of content with fewer rules.
Common applicable scenarios include:
- Batch deleting month names in PDFs, such as the English months April, May, etc.
- Batch deleting four-digit years in PDFs, such as 2017, 2018, 2026, etc.
- Batch cleaning parts of project numbers, report numbers, or contract numbers in files.
- Batch deleting recurring sensitive words, department names, contact information, or version marks in PDFs.
- Batch processing a group of PDFs with scan-recognized text to uniformly clean up text content that does not need to be displayed.
It should be noted that this article demonstrates the find and replace of text within PDF content. If a PDF page is a pure image without a recognizable text layer, normal text search may not match the text in the image. For such files, it is recommended to confirm whether the PDF supports text selection and copying before processing.
Preview of Effect: Before Processing, Multiple PDFs Contain Date Keywords Needing Deletion
This example has prepared 4 PDF files, named 1.pdf, 2.pdf, 3.pdf, and 4.pdf respectively. They are located in the same folder and need to be added to the software at once for batch processing. For office workers, this kind of batch files is very common, such as the same batch of reports, the same batch of archival materials, or the same batch of publicly released documents.

Upon opening one of the PDFs, it can be seen that the cover date position contains "April 13, 2017". In the screenshot, a red box marks two types of content needing processing: one is the English month "April", and the other is the four-digit year "2017". These two positions are the targets for deletion using the rules later. Since different files may have different months like April or May, and possibly different years, it is suitable to use formula fuzzy searching instead of inputting complete dates one by one.

Post-Processing Effect: Month and Year in PDF Batch Deleted
After the processing is complete, open the PDF to check the effect. The original position of "April 13, 2017" has changed: the English month and the four-digit year have been deleted, leaving only the middle "13,". This indicates that the fuzzy search rules successfully matched the target text, and because the replacement keyword list was empty, the software performed the action of "deleting the matched content."

The advantage of this processing method is that there is no need to manually search page by page in each PDF. As long as the rules are set correctly, multiple PDFs can be processed in batches using the same set of rules, suitable for office tasks requiring the repeated cleaning of dates, years, numbers, and keywords.
Steps: Using HeSoft Doc Batch Tool to Batch Delete PDF Keywords
Step One: Enter PDF Tools, Select Find and Replace Keywords in PDF
After opening " HeSoft Doc Batch Tool ", select "PDF Tools" from the left-side tool classification. Find "Find and Replace Keywords in PDF" in the PDF tools list. The screenshot shows the function description: "Batch find and replace keywords in PDF file content," which is the core function used in this article.

The operational purpose of this step is to enter the batch processing workflow specifically for PDF text find and replace. Unlike ordinary PDF readers, the value of this type of office software lies in batch file processing: configure rules only once, and they can be applied to multiple PDFs simultaneously, reducing the manual operations of repeated opening, searching, editing, and saving.
Step Two: Add Multiple PDF Files to Process
After entering the function page, you can see buttons like "Add Files", "Import Files from Folder", "Clear", "More", etc., at the top of the interface. For a small number of files, you can click "Add Files" to select them one by one; if multiple PDFs are all in the same folder, using "Import Files from Folder" is more suitable, as this allows you to import a whole batch of PDFs at once.
In the example, 4 PDF files have been imported, and the list shows information such as serial number, name, path, extension, creation time, and modification time. It can be seen that the file paths are 1.pdf, 2.pdf, 3.pdf, 4.pdf under the D drive test directory, the extension is pdf, and the total number records at the bottom is 4.

The expected result of this step is to confirm that all PDFs needing batch keyword deletion have appeared in the list. If files not needing processing are found imported, they can be removed through the delete operation on the right side of the list; if files were imported incorrectly, you can also use "Clear" and then add them again. After confirming there are no errors, click the "Next" button at the bottom to proceed to the processing options settings.
Step Three: Select Formula Fuzzy Search Text, Enter Wildcard Rules
After entering "Set Processing Options", you need to first set the search method. The screenshot shows that search methods include "Exact Search Text" and "Use Formula Fuzzy Search Text". Since what this article intends to delete is not a completely fixed string, but content with regular variations like months and years, you should choose "Use Formula Fuzzy Search Text".

In the "Keyword List to Find", the example inputs two rules:
- April|May: Used to match April or May. The vertical bar here signifies an "or" relationship, suitable for matching multiple possible month words in a single rule.
- \d{4}: Used to match four-digit numbers, such as 2017, 2026, etc. For content like years, which are fixed as four-digit numbers, this rule is more efficient than inputting years one by one.
On the right is the "Keyword List After Replacement", and the interface prompt says "Leaving blank means deletion". Therefore, if the goal is to batch delete the matched content in PDFs, do not fill in the replacement text on the right. That is, the left side identifies the content to delete, the right side is kept empty, and the software will delete the matched text during processing.
This step is the most critical setting in the entire workflow. The more accurate the rules, the more the results meet expectations. If you only want to delete April and May, do not write overly broad rules; if you only want to delete the year, you can only fill in \d{4}. If you want to delete multiple categories of text simultaneously, you can fill in multiple rules on separate lines as shown in the example.
Step Four: Continue to the Next Step, Set Save Location and Start Processing
After setting the keyword rules, click the "Next" button at the bottom. The top of the interface flow shows that there are two subsequent stages: "Set Save Location" and "Start Processing". Follow the software prompts to select the save location for the processed files, then proceed to the start processing stage.
It is recommended not to directly overwrite the original files, especially when using wildcard or formula fuzzy rules for the first time. A safer approach is to save the processed PDFs to a new folder, first spot-check a few files to confirm the deletion effect is correct, and then decide whether to replace the original files. This way, even if the rules are not set accurately enough, the original PDFs are kept as backups.
After processing is complete, open the PDFs in the output folder for checking. The results in the example show that "April" and "2017" have been deleted, indicating the rules were successfully applied. Other PDFs containing similar date formats will also be batch processed according to the same rules.
Wildcard Rule Setting Suggestions: How to Reduce Accidental Deletion
When using formula fuzzy search text, controlling the match scope is most important. Taking \d{4} as an example, it will match four-digit numbers. Although this is very suitable for deleting years, if there are other four-digit numbers in the PDF, such as report numbers, page numbers, or project numbers, they might also be matched. Therefore, before formal batch processing, you should first assess the text structure in the document.
If only dealing with the cover date, and there are many four-digit numbers in the document, you need to use the standalone \d{4} rule cautiously. You can test with a small number of files first to confirm it will not accidentally delete other important information. The same applies to the month rule; April|May will only match these two English words. If you also need to delete months like June or July, you need to continue supplementing the corresponding rules.
Additionally, there is an "Ignore letter case" option in the interface. If the PDF may contain forms like April, APRIL, or april in different cases, you can check it based on the actual situation; if case has distinguishing significance, then it is not recommended to check it casually.
Common Questions and Precautions
1. Why is the replacement keyword list left blank?
Because the goal this time is deletion, not replacement with other text. The interface already prompts "Leaving blank means deletion", so keeping the right side empty is correct. If you fill in new content on the right, the software will replace the matched keywords with the filled content, instead of deleting them.
2. Why choose formula fuzzy search instead of exact search?
Exact search is suitable for deleting text that is exactly identical, for example, if "Internal Information" appears fixedly in all PDFs. Formula fuzzy search is suitable for processing text with variation patterns, such as different years, different months, different numbers. The months and years in this article have the potential for change, so using formula fuzzy search is more efficient.
3. Can dozens or hundreds of PDFs be processed at once?
From the functional design perspective, this tool is oriented towards batch file processing, supporting the addition of multiple PDFs to a list and then processing them uniformly. The actual number of files to process is recommended to be arranged considering computer performance, PDF size, and page count. When there are many files, processing in batches facilitates result checking and problem pinpointing.
4. Do I need to back up the original PDFs before processing?
Backup is recommended. Especially when using broader scope rules like \d{4}, backup can prevent difficulties in recovery after accidental deletion. It is even more recommended to save the processing results to a new folder, then perform manual spot checks.
Summary: Turn Repetitive Deletion into a One-Time Configuration with Batch Processing Tools
The core idea of batch deleting keywords in PDFs is not complicated: first add multiple PDFs to " HeSoft Doc Batch Tool ", go to "Find and Replace Keywords in PDF", select "Use Formula Fuzzy Search Text", fill in the wildcard or formula rules to match on the left, leave the right-side replacement content blank, and finally set the save location and start processing.
Compared to opening PDFs one by one for manual find and delete, batch processing can significantly reduce repetitive work, especially suitable for processing large volumes of reports, contracts, archived files, and public documents. It is suggested that before formally processing a large number of PDFs, test the rules with a few sample documents first. Once the effect is confirmed, then execute in batch. This can both improve efficiency and reduce the risk of accidental deletion.