In this article, we will show you how to automate data extraction from PDF directly from your browser using a spreadsheet tool called Lido. Simply follow the process below.
We will be using Lido, which is a spreadsheet created to automate and streamline repetitive tasks. You can create an account using this link: https://www.lido.app/go/signup.
In this method we will extract data directly from the File menu.
After setting up your Lido account, locate and click on the "New File" button. This action will open a blank spreadsheet in Lido, giving you a fresh workspace to begin your project.
Access the PDF importing tool by navigating to the File menu in your Lido interface. This tool is specifically designed to facilitate the conversion of information from PDFs into a usable spreadsheet format, enabling seamless data integration.
Use the upload function in the PDF importer to select and upload the PDF file from your computer. This step prepares the file for data extraction by loading it into the Lido system.
Once your PDF is uploaded, you'll have the opportunity to select the specific area or pages from which you want to extract data. After making your selection, click the "Extract data" button to initiate the extraction process.
The data has been successfully transferred to the currently selected cell in your spreadsheet. The PDF importer is designed to transform data into a format suitable for spreadsheets. When the selected section is purely text, each line of text is placed into its own cell.
In cases where the selection includes tables, the data from these tables is accurately extracted. For selections that contain both tables and text, only the data from the tables is extracted, while the text is disregarded. If you need to extract more data from the PDF, you should click "Back" now. To finish, you can close the window by clicking the "X" icon in the top right corner.
This final step confirms that the data extraction and insertion processes are complete. Your spreadsheet now contains the data from the PDF, organized according to your selections, and is ready for further analysis or manipulation.
In this method, we'll use Lido's unique function, IMPORTPDF, to automatically extract all the content from the given PDF at once. It's important to mention that IMPORTPDF doesn't work well with scanned PDFs. For those, you should consider using the third method mentioned below, which makes use of the EXTRACTTABLESFROMPDF function.
Log in to your Google Drive account and upload the PDF file from which you want to extract data. Ensure the PDF is stored in an easily accessible location in your Drive for later retrieval.
Open Lido and create a new file by clicking on the "New File" option. This new file will serve as the destination for the data you're about to extract from the PDF.
In your new Lido file, add a new worksheet by clicking the plus icon. This worksheet will be the specific area where the extracted data will be placed.
In the first cell of your new worksheet, begin typing the formula “=IMPORTPDF(“ to initiate the process of importing data from a PDF file.
To allow Lido to access the PDF file in your Google Drive, select "Add Credential" and follow the necessary steps to securely link your Google account with Lido.
After linking your account, continue the formula by typing a comma and then click on "Select a file" to choose the specific PDF from which you want to extract data.
In the file selector, navigate to and select the PDF file you previously uploaded to Google Drive. This is the file from which data will be extracted.
Complete the formula by specifying the destination of the extracted data, which is “Sheet1!B2”. This tells Lido to insert the extracted data starting at cell B2 of Sheet1. Press ENTER to finalize the formula setup.
The last parameter of the IMPORTPDF function determines where the extracted data should be inserted. Here, we are specifying that the data should be inserted in worksheet Sheet1, starting at cell B2.
Right-click on cell A1 where you entered the formula and select "Run action" from the context menu to execute the IMPORTPDF function and begin the data extraction process.
After running the action, navigate to Sheet1 to verify that the data has been correctly and completely extracted from the PDF and is properly formatted in the specified cells. This step is important to confirm the accuracy and completeness of the data extraction.
In this approach, we will employ Lido's unique formula, EXTRACTTABLESFROMPDF, which is designed to extract anything it identifies as a table from the PDF. This formula is effective on scanned documents.
Log into your Google Drive and upload the specific PDF file from which you need to extract table data. Make sure the file is stored in a location within your Drive that is easy to access later.
Open Lido and create a new file by selecting the "New File" option. This file will be used to store and work with the data you extract from your PDF.
Add a new worksheet to your Lido file by clicking the plus icon. This new worksheet will be the place where the extracted table data will be populated.
In cell A1 of the new worksheet, start entering the formula “=EXTRACTTABLESFROMPDF(“ to initiate the table extraction process from your PDF.
Click on "Add Credential" to link your Google Drive account with Lido, enabling Lido to access the PDF file you intend to extract data from. Follow the prompted steps to ensure a secure connection.
After setting up the credentials, proceed with the formula by typing a comma, then click on "Select a file" to bring up a file selector where you can choose the PDF file uploaded earlier.
Find and select the uploaded PDF file in the file selector. This is the document from which the table data will be extracted.
Complete the formula by specifying the location in the spreadsheet where the extracted data should be placed, which is “Sheet1!B2”. This parameter ensures the data is inserted starting at cell B2 in Sheet1. Press ENTER to apply the formula.
To execute the EXTRACTTABLESFROMPDF function, right-click on cell A1 where your formula is entered and select "Run action". This will start the process of extracting table data from the PDF.
After running the formula, check the Sheet1 to see if the tables from the PDF have been accurately extracted and are correctly displayed. This verification step ensures that only the intended table data has been captured, as this formula does not extract non-table data.
For extracting data that isn't in tables, you should consider using methods 1 and 2.
We hope that you now have a better understanding of how to automate data extraction from PDF.