This article describes how to use HeSoft Doc Batch Tool to batch extract fixed-format text such as contract numbers and document numbers from multiple PDF files, and automatically rename them to the corresponding PDF file names. In the example, the original file names are 1.pdf, 2.pdf, 3.pdf, 4.pdf, and after processing, they become numbered file names such as 10026877.pdf, 20036655.pdf, which is suitable for batch archiving scenarios of contracts, invoices, reports, and archival materials.
In managing PDF files such as contracts, orders, invoices, inspection reports, and scanned archives, many files, when first exported or scanned, have filenames that are just simple serial numbers, like 1.pdf, 2.pdf, 3.pdf, or 4.pdf. The truly meaningful business information is often within the PDF body, such as contract numbers, order numbers, customer numbers, or report numbers. If you open each PDF to check the number and then manually rename it, it is not only time-consuming but also prone to errors like copying the wrong number, missing changes, or creating duplicate names.
The problem this article aims to solve is: when multiple PDF files contain a fixed-format number, how to use wildcard expressions or regular expressions to batch-match that number and use the matched text as the new PDF filename. In the example, the contract number in the PDF body is an 8-digit number, such as 10026877. After processing, the filename automatically becomes 10026877.pdf. The entire process uses the office software " HeSoft Doc Batch Tool ", which is designed for batch processing document files, reducing repetitive work, and is suitable for office scenarios requiring centralized organization of large volumes of PDF, Word, Excel, PPT, and text files.
Applicable Scenarios: Which PDF Files Are Suitable for Batch Renaming by Content Number
Using wildcard expressions for batch renaming PDFs is most suitable for processing materials where a "stable naming basis exists within the file content." For instance, every contract's first page has a Contract No., contract number, or project number; every invoice, statement, or expense report has an invoice number or serial number; every report's first page has an inspection number, sample number, or case number. As long as these numbers can be recognized in the PDF body text and the format is relatively fixed, you can consider extracting them in batches using expressions.
From the common search habits of SEO users, this type of demand is often described as "rename PDF by content," "extract number from PDF as filename," "batch rename PDF files," "auto-name PDF by contract number," "rename PDF files with regular expressions," etc. Although the example here uses PDFs, similar logic can be extended to other office file management scenarios, such as using the contract number in a Word document as the .docx or .doc filename, or a number in a text file as the .txt filename. However, the screenshots and steps in this article primarily focus on PDF files.
It should be noted that both wildcard and regular expressions are pattern-matching methods. The software interface in the screenshot uses a "Regular Expression" input box; the sample expression is \d{8}, which means matching a sequence of 8 consecutive digits. For average users, this can be understood as a more precise "wildcard matching rule": instead of specifying a specific number, you tell the software "find the text that is a sequence of 8 consecutive digits in the PDF content."
Result Preview: Filenames Without Business Meaning Before Processing, Displaying Contract Numbers Directly After
Before Processing: PDF Files Named with Simple Serial Numbers
In the screenshot before processing, you can see there are 4 PDF files in the folder, named 1.pdf, 2.pdf, 3.pdf, and 4.pdf. While these names can differentiate the files by count, they cannot tell us which contract or number each PDF corresponds to. Later retrieval, archiving, uploading to a system, or sending to a colleague would require opening the file to confirm its content.

After opening one of these PDFs, you can see the precise location of the contract number in the main text. The screenshot highlights the content "Contract No. 10026877" with a red box, which is exactly the key information suitable for extraction as the filename. If every PDF has a similar 8-digit contract number, batch renaming can be completed in one go using an expression.

After Processing: Filenames Automatically Become the Numbers from the PDF Content
After processing is complete, the previously meaningless 1.pdf, 2.pdf, 3.pdf, 4.pdf have been batch-changed to 10026877.pdf, 20036655.pdf, 20100511.pdf, 33952100.pdf. This way, you can directly tell the corresponding contract or material number from the filename without opening the PDF, significantly improving subsequent search and archiving efficiency.

Operational Steps: Using HeSoft Doc Batch Tool to Extract an 8-Digit Number from a PDF
Step 1: Enter the "File Name" function category and select "Rename PDF files using file content"
After launching HeSoft Doc Batch Tool , you can see categories like Home, Task Flow, All Tools, File Name, Folder Name, File Management, Word Tools, Excel Tools, PowerPoint Tools, PDF Tools, etc., in the left function bar. Since the goal is to batch-modify filenames, you should enter the "File Name" category.
In the function card, select "7. Rename PDF files using file content". The interface description shows that this function is used to "batch-use certain text from PDF file content as the filename of that file." This precisely matches the need of this article: to extract the contract number from the PDF body text and automatically generate a new PDF filename.

The purpose of this step is to select the correct batch processing tool entry point. The expected result is to enter a wizard page with steps, allowing you to subsequently add PDFs, set matching rules, set the save location, and begin processing.
Step 2: Add the PDF files to be processed and confirm the file list
After entering the function page, the top of the interface displays the current function name "Rename PDF files using file content". The page uses a step-by-step process: Step 1 is "Select records to process", Step 2 is "Set processing options", Step 3 is "Set save location", and Step 4 is "Start processing".
In Step 1, you can add PDFs to the list one by one using the "Add Files" button at the top right, or use "Import Files from Folder" to import all PDFs from a folder at once. The screenshot already has 4 files imported, named 1.pdf, 2.pdf, 3.pdf, and 4.pdf, located in the D:\test directory, with the .pdf extension. The table also shows creation time, modification time, and other information, with the total record count at the bottom being 4.

The purpose of this step is to add the PDF files to be batch-renamed into the processing queue. The expected result is that the list shows all PDFs that need processing, and the count matches the actual number of files. If irrelevant files are added by mistake, you can remove them according to the delete icon in the interface; if you need to re-select, you can also see a "Clear" button on the interface to empty the current list.
Step 3: Set the matching area and select the text matched by a custom expression
After clicking the "Next" button at the bottom, you enter Step 2 "Set processing options". In the "Search Area", the interface provides multiple options, including "First line of text", "First barcode image", and "Text matched by custom formula". This example requires extracting an 8-digit contract number from the PDF body, so "Text matched by custom formula" is selected.

The reason for choosing this option is that the contract number is not always the very first line of the text, nor is it a barcode image, but a segment of numerical text within the body. Using a custom formula allows the software to actively search for content meeting the criteria, rather than relying on a fixed line number. For a large number of PDFs, this method is more stable than manual one-by-one positioning and more suitable for batch processing.
Step 4: Enter the regular expression to match 8 consecutive digits
In the "Regular Expression" input box, the sample screenshot fills in \d{8}. This expression can be understood as: match a sequence of 8 consecutive digits. Here, \d represents a digit, and {8} means 8 consecutive times. This rule is very intuitive for contract numbers, order numbers, project numbers, etc., that are fixed as 8-digit numbers in PDF documents.
For example, when the text "Contract No. 10026877" appears in a PDF body, the expression \d{8} will match 10026877. The software can then use the matched text as the new filename, so the original 1.pdf is renamed to 10026877.pdf. Other files will be processed by the same rule, resulting in 20036655.pdf, 20100511.pdf, 33952100.pdf, etc.
If your PDF number is not 8 digits, you need to adjust the expression according to the actual format. For instance, if the number is 6 digits, the logic should match 6 consecutive digits; if the number contains letters, hyphens, or fixed prefixes, you need to use a rule consistent with the actual number format. This article does not elaborate on complex expression writing; the key point is to explain: the example in the screenshot uses \d{8} to meet the need of "batch renaming by the 8-digit number in the PDF content".
Step 5: Select the filename location to replace the entire filename
At the bottom of the same settings page, you can see the "Position" option, including "Replace the entire filename", "To the left of the filename", and "To the right of the filename". This example selects "Replace the entire filename". This means the software will replace the original filename body with the matched contract number, keeping the .pdf extension.
Choosing "Replace the entire filename" is suitable for scenarios where you want the filename to consist entirely of the number, such as generating 10026877.pdf. If you wish to keep the original serial number or append the number before or after the original filename, you can choose the left or right position option based on your actual needs. However, judging by the processing results in the screenshot, this example uses the method of directly replacing the original filename with the number.
Step 6: Continue to the next step, set the save location, and start processing
After completing the matching rule and position settings, click the "Next" button at the bottom of the page to follow the wizard into "Set save location". In the screenshot, you can see that the process indeed includes Step 3 "Set save location" and Step 4 "Start processing". Since different users have different strategies for protecting original files, it is recommended to specify the save location before batch processing: if the software offers an alternative save location, you should prioritize saving to a new folder for easier result verification; if you need to overwrite or alter the original filename, it is also advisable to back up the original PDFs first.
After the settings are complete, enter "Start processing". Once processing is finished, return to the folder to check the results. If the filenames have changed from 1.pdf, 2.pdf, etc., to the corresponding 8-digit numbers, it means the expression matching and batch renaming have been successfully completed.
Common Issues and Notes
1. Why use \d{8} instead of directly entering 10026877?
Entering 10026877 directly only matches one specific number, whereas the key to batch renaming is that the number in each PDF is different. Using an expression like \d{8} tells the software to match "any sequence of 8 consecutive digits", so it can simultaneously process different numbers like 10026877, 20036655, 20100511, 33952100.
2. What if there are multiple 8-digit numbers in the PDF?
If the PDF body contains other sequences of 8 consecutive digits, such as dates, phone numbers, or other serial numbers, in addition to the contract number, simply using \d{8} might match non-target content. It is recommended to spot-check a few PDFs to confirm if the target number is unique in the document. If it is not, you need to adjust the expression based on the nearby text, the number's format, or more precise rules.
3. Can numbers be recognized in scanned PDFs?
The PDF content in this article's screenshots is displayed as text in the reader, so the software can perform text-based matching. If a PDF is a pure image scan, the number hasn't been recognized as text, and batch extraction may not yield the expected result. When dealing with scanned materials, it is typically necessary to perform text recognition (OCR) first before doing content matching.
4. Is a backup necessary before renaming?
A backup is recommended. The advantage of batch file processing is speed, but if the rules are set incorrectly, it can also batch-generate filenames that do not meet expectations. In daily office work, you can first test with a small number of files to confirm the expression and naming results are correct, then process the large volume of PDFs in the complete folder.
5. What should be noted about duplicate filenames?
If the same number is extracted from two PDFs, there is a risk of duplicate names. Before processing, you should confirm whether the numbers are unique, especially in folders with many contract copies, supplementary agreements, attachments, or duplicate scans. For materials that require version differentiation, consider retaining date, serial number, or other information alongside the number.
Summary: Using Expressions for Batch Renaming PDFs Makes File Archiving More Efficient
As seen in the example in this article, HeSoft Doc Batch Tool , as a document batch processing software for office scenarios, can extract key information from PDF body text and batch-generate standardized filenames. The process that previously required opening each PDF, checking the contract number, and manually copying, pasting, and renaming can now be completed centrally by adding files, setting the expression \d{8}, choosing to replace the filename, and setting a save location.
For tasks like contract management, archive organization, financial document archiving, and project material transfer, batch renaming PDFs not only saves time but also reduces manual entry errors. If you also have a large number of PDF files with chaotic names but numbers contained within the body text, it is recommended to first extract a few for rule testing, then use this function to batch-process the entire folder, making file naming more standardized, searching more convenient, and subsequent collaboration smoother.