Batch renaming of PDF files: Extract contract numbers as file names using wildcards/regular expressions


Translation:EnglishFrançaisDeutschEspañol日本語한국어,Update Time:2026-06-05 09:42:45

Disclaimer: All images, text, and video content on the website are for reference only and may not be the latest, correct, or accurate. In case of any dispute, please refer to the actual experience effect!

This article introduces how to use HeSoft Doc Batch Tool to batch rename multiple PDF files originally saved with meaningless names like 1.pdf, 2.pdf, etc., to contract numbers from the file content. By using the "Rename PDF Files Using File Content" feature, combined with a regular expression similar to a wildcard, \d{8}, it can automatically identify 8-digit number codes from the PDF body and overwrite the original filenames, suitable for office scenarios where contracts, orders, archives, scanned documents, etc., need to be archived by number.

In daily office work, many PDF files are initially saved simply by download order, scan order, or temporary numbering, such as 1.pdf, 2.pdf, 3.pdf, 4.pdf. While such filenames are convenient for temporary storage, they are not conducive to subsequent retrieval, archiving, and sharing. Especially for PDF documents like contracts, agreements, quotations, orders, invoices, and project materials, the truly valuable information often lies within the file content, such as contract numbers, order numbers, client numbers, or project numbers. If you open each PDF one by one to check the number and then manually modify the filename, it is not only time-consuming but also prone to copying errors, omissions, or duplicate names.

This article addresses this exact problem: how to use the batch processing capabilities in office software to extract fixed-format numbers from the content of many PDF files using wildcard/regular expression rules, and then batch rename the PDFs to the corresponding numbers. The following uses HeSoft Doc Batch Tool as an example to demonstrate the complete workflow of batch renaming multiple PDF files from "1.pdf, 2.pdf…" to "10026877.pdf, 20036655.pdf…".

Applicable Scenarios

This method is particularly suitable for batch PDF renaming tasks where there is a large number of files, uniform naming rules, and identifiable numbers in the main text. Unlike ordinary "find and replace filename" operations, this does not modify text within the original filename but reads specified text from the PDF content and then uses that read text as the new filename.

Common scenarios include:

  • Batch Renaming Contract PDFs: Extract Contract No., contract number, agreement number, and similar information from the first page of the contract.
  • Batch Naming Order PDFs: Extract order numbers, purchase order numbers, or customer order numbers from the order body.
  • Archiving Scanned Copies: Scanned file names might be 1.pdf, 2.pdf, scan001.pdf, and need to be archived by their body numbers.
  • Organizing Financial Documents: Extract numbers from invoices, payment applications, expense reports, and other PDFs to use as filenames.
  • Managing Project Materials: Extract project numbers, task numbers, or work order numbers to uniformly name the PDF files.

If these numbers have a relatively fixed format, for example, all are 8 digits, you can use a similar wildcard expression for matching. The operation in the screenshot uses the regular expression "\d{8}", which means matching a sequence of 8 consecutive digits. For office users, it can be understood as a more precise wildcard expression: instead of manually specifying each number, you let the software automatically find text that matches the rule.

Result Preview: Before and After Processing

Before Processing: Filenames only show sequence numbers, revealing nothing about the content

The filenames before processing are very simple: 1.pdf, 2.pdf, 3.pdf, 4.pdf. From the filenames alone, it is completely impossible to tell which contract corresponds to each PDF, nor to directly search for a contract number by filename.

image-Batch rename PDFs,rename PDFs using wildcard expressions,extract PDF content with regular expressions,name files by contract number

After opening one of the PDFs, you can see a clear contract number in the document content. For example, the contract homepage in the screenshot shows "Contract No." followed by a string of 8 digits "10026877". This is exactly the content we want to extract and use as the filename.

image-Batch rename PDFs,rename PDFs using wildcard expressions,extract PDF content with regular expressions,name files by contract number

After Processing: PDF filenames are changed to the numbers from the body text

After the batch process is complete, the original sequential filenames are replaced by the 8-digit numbers identified from the PDF content. In the example, the filenames become 10026877.pdf, 20036655.pdf, 20100511.pdf, 33952100.pdf. This way, you can identify and search for the corresponding contract by filename without opening the file.

image-Batch rename PDFs,rename PDFs using wildcard expressions,extract PDF content with regular expressions,name files by contract number

For office documents requiring long-term archiving, this naming method is more reliable than simple serial numbers. Subsequently, whether searching in a local folder, shared drive, enterprise cloud drive, or document management system, you can directly locate the file using the number.

Steps

Step 1: Enter the "Rename PDF files using file content" function

After opening HeSoft Doc Batch Tool , you can see multiple office file processing categories on the left, including File Name, Folder Name, File Organization, Word Tools, Excel Tools, PDF Tools, etc. Here we need to process PDF filenames, so select the "File Name" category on the left.

In the function cards, find "7. Rename PDF files using file content". The description for this function is to batch-use certain text from the PDF file content as the filename for that file, which is perfectly suited for extracting contract numbers from contract PDFs and renaming them.

image-Batch rename PDFs,rename PDFs using wildcard expressions,extract PDF content with regular expressions,name files by contract number

The purpose of this step is to enter the correct batch processing entry. Unlike common filename replacement, this function reads PDF content, making it suitable for documents where the filenames are meaningless but the body contains valid numbers.

Step 2: Add the PDF files to be batch renamed

After entering the function page, the top of the interface shows the current function as "Rename PDF files using file content". The first step on the page is "Select records to process". You can select PDFs one by one via "Add Files", or import all PDFs from a specific folder at once via "Import files from folder".

The screenshot shows that 4 PDF files have been imported, with the list displaying information such as serial number, name, path, extension, creation time, and modification time. The filenames are 1.pdf, 2.pdf, 3.pdf, 4.pdf, all with the pdf extension, located in a test directory on the D drive.

image-Batch rename PDFs,rename PDFs using wildcard expressions,extract PDF content with regular expressions,name files by contract number

The purpose of this step is to confirm the scope of files for batch processing. After importing, it is recommended to check the list: first, confirm the number of files is correct; second, confirm the extension is pdf; third, ensure no files that shouldn't be processed were added inadvertently. If a file should not be processed, you can use the delete operation in the list to remove the corresponding record.

Step 3: Set which text segment to extract from the PDF content

Clicking "Next" leads to "Set processing options". This is the key part of the entire batch renaming workflow. The interface has a "Search Area" option, and the screenshot shows "Custom formula matched text" is selected. This means the software will not simply take the first line of text or a fixed position, but will search for matching text in the PDF content based on the rule we input.

In the "Regular Expression" input box, the example shows "\d{8}". This expression matches a sequence of 8 consecutive digits. Since the contract number is exactly 8 digits, the software will search the PDF content for text matching this rule and use the matched number for renaming.

image-Batch rename PDFs,rename PDFs using wildcard expressions,extract PDF content with regular expressions,name files by contract number

Further down the same page is the "Position" setting, where the screenshot shows "Overwrite the entire filename" is selected. This means the newly extracted number will directly replace the original filename body. For example, 1.pdf will become 10026877.pdf, rather than adding the number as a prefix or suffix to the original filename.

The expected result of this step is: the software can identify an 8-digit number from each PDF according to the rule and use this number as the new filename for that PDF. If your number is not 8 digits, you need to adjust the expression according to the actual format. For instance, if the number is 6 digits, you can use \d{6}; if it's 10 digits, you can use \d{10}. However, the specific expression should be based on your document content to avoid matching irrelevant numbers.

Step 4: Set the save location and start processing

As seen from the workflow at the top of the interface, the subsequent steps are "Set save location" and "Start processing". After completing the rule settings, continue clicking "Next", choose a save location according to the interface prompts, and then proceed to the start processing stage.

For this step, it is recommended to choose an appropriate save method based on the importance of the files. For important contracts or formal archives, it's suggested to output to a new folder first, confirm the naming results are correct, and then replace or archive them. This reduces the risk of operational errors and makes it easier to trace back to the files before and after processing.

After starting the process, the software will batch-read the content of the PDFs in the list, find the corresponding numbers based on the "\d{8}" rule, and change the filenames to the number format. After processing, you can go back to the folder to view the results. As shown in the earlier result preview, the original files like 1.pdf, 2.pdf will become filenames with their corresponding contract numbers.

Guidelines for Setting Wildcards / Regular Expressions

Many users refer to such rules as wildcard expressions. Strictly speaking, the name of the input field in the screenshot is "Regular Expression", which is more suitable for processing fixed-format text than ordinary wildcards. Wildcards are usually used for simple matches, such as an asterisk representing any character; regular expressions can more precisely describe rules like "several consecutive digits", "a number after a certain prefix", or "a code containing letters and numbers".

In this example, the contract number is a sequence of 8 consecutive digits, so \d{8} is used. Here, \d represents a digit, and {8} means it appears exactly 8 times in a row. Combined, it means finding a sequence of 8 consecutive digits. For batch PDF renaming, the advantage of this rule is that you don't need to know the specific number in each file; as long as the number format is consistent, the software can identify them one by one.

Caution: If multiple sequences of 8 consecutive digits exist in a PDF, the software might match one of them. Therefore, before formal batch processing, it is recommended to test with a small number of samples to confirm that the extracted result is indeed the contract number, not a date, phone number, monetary amount, or other digit sequence. If there are multiple similar digit sequences in the document, consider making the rule more specific, for instance, by matching based on fixed text before and after the number, but the specific writing method needs to be designed according to the actual document content.

FAQ and Precautions

1. Why should I open a PDF to check the number format before processing?

Because batch renaming relies on content matching rules. Opening one or two sample PDFs first to confirm if the contract numbers are all 8 digits, if they all appear in the document, and if there are other numbers of the same length, helps you choose a more accurate expression and avoid naming errors after batch processing.

2. If the PDF is a scanned image, can the number be directly recognized?

The screenshots in this article show processing visible text content in a PDF reader and renaming it via content matching. If the PDF is purely a picture scan, whether text can be extracted directly depends on if a recognizable text layer already exists in the file. Before processing, try copying the number from the PDF. If it cannot be copied, OCR text recognition might be needed first before using content renaming.

3. Will the original sequential names like 1, 2, 3 be retained in the filename?

The "Position" selected in the screenshot is "Overwrite the entire filename", so the original filename body will be replaced by the extracted number. The extension pdf is retained, so the result will be in the form of 10026877.pdf, not 10026877 or 10026877.pdf.pdf.

4. Should I back up files before batch processing?

Backing up is recommended. While batch processing can significantly improve efficiency, when dealing with formal documents like contracts, financial records, and archives, it's best to first copy a test directory, confirm the rules are correct, and then process the formal files. You can also select a small number of PDFs for a trial run first to confirm the resulting filenames meet expectations.

5. Does this method only apply to PDFs?

This article demonstrates the "Rename PDF files using file content" function, which is applicable to PDF files. The software interface also shows other office file processing categories like Word Tools and Excel Tools. When processing files like doc, docx, xls, xlsx, you should select the corresponding function entry and not mix different formats in the same PDF processing task.

Summary

Using HeSoft Doc Batch Tool , you can transform the PDF organization work that originally required manual opening, viewing, copying, and renaming into a set of rule-based batch operations. For contract PDFs, as long as a stable-format contract number exists in the body text, you can quickly extract the number and overwrite the original filename using the "Rename PDF files using file content" function, in conjunction with the wildcard-like regular expression \d{8}.

The core value of this method lies in reducing repetitive labor, lowering manual naming errors, and enhancing file retrieval efficiency. Before formally processing a large number of PDFs, it is recommended to select a few sample files to test the expression and naming results; once confirmed correct, batch import the entire folder for processing. This will noticeably improve daily office efficiency for contracts, orders, and scanned copies that require long-term archiving.


Keyword:Batch rename PDFs , rename PDFs using wildcard expressions , extract PDF content with regular expressions , name files by contract number
Creation Time:2026-06-05 09:42:29

Disclaimer: All images, text, and video content on the website are for reference only and may not be the latest, correct, or accurate. In case of any dispute, please refer to the actual experience effect!

Related Articles

Don't see the feature you want?

Provide us with your feedback, and after evaluation, we will implement it for free!