In this article, we will show you how to extract data from PDF from your browser. Simply follow the process below.
We will use Lido, a tool designed to automate repetitive spreadsheet tasks. You can get set up by going to the following link: https://www.lido.app/go/signup.
Open Lido and create a new spreadsheet. This blank spreadsheet will be where you will place all the data you extract from the PDF, allowing you to organize and analyze it efficiently.
In your Lido spreadsheet, navigate to the File menu and select the “Import from PDF” option. This activates the PDF Importer, a tool specifically designed to convert information from PDFs into a structured spreadsheet format.
Select and upload the PDF file from which you need to extract data. Ensure you have the correct file that contains the information you need.
Once your PDF is uploaded, use the selection tool to define precisely which area of the PDF you want to convert—be it text or tables. Make sure to select only the area that contains the data you need before clicking "Extract data" to begin the extraction process.
Inspect the extracted data shown in the preview to ensure it matches what you intended to extract, whether it's text lines or tabular data.
If everything is good, click “Insert at active cell” to transfer this data into the spreadsheet.
If the area contains both tabular data and text, remember that the tool prioritizes tabular data, so text outside tables may be ignored. If you need to adjust your selection or extract more data, click “Back.”
Review your Lido spreadsheet to confirm that the data appears as expected and is correctly formatted. Each piece of information should be in its respective cell, aligned as per the structure of the original PDF. You can now save your work or continue manipulating the data as needed.
In this approach, we will utilize Lido’s custom formula IMPORTPDF, which extracts all content from the provided PDF in one go. Note that IMPORTPDF is not effective with scanned PDF documents; if you're working with a scanned PDF, you should opt for method 3 below that employs the EXTRACTTABLESFROMPDF formula instead.
Access your Google Drive account and upload the PDF file you want to work with. This is important as you’ll need the PDF to be online to use it with Lido’s IMPORTPDF formula.
Open Lido and set up a new, blank spreadsheet. This serves as your working area where the PDF data will be imported.
Add a new worksheet within your Lido spreadsheet by clicking the "+" icon. This is where you'll insert the IMPORTPDF formula and eventually view the extracted data.
Go to the newly created worksheet and type "=IMPORTPDF(" into cell A1. This sets the stage for linking your PDF.
Follow the prompt to "Add Credential" which involves authorizing Lido to access your Google Drive. This step is crucial for Lido to access the PDF you uploaded.
After adding your Google Drive credentials, continue the formula by typing a comma and then clicking "Select a file" to browse and choose the appropriate PDF from your Google Drive.
Navigate through your files to locate the previously uploaded PDF and select it. This links the PDF directly to your formula.
The final argument of the IMPORTPDF formula determines where the extracted data should be placed. In this case, we are specifying that the data should be placed in worksheet Sheet1, starting at cell B2.
Once your formula is set, right-click on cell A1 where you entered the formula and select "Run action" from the menu. This will execute the formula and start the data extraction process.
Switch to "Sheet1" to check the results. Ensure that the data from the PDF has been correctly extracted and is accurately displayed starting from cell B2 as intended.
In this method, we will use Lido’s custom formula EXTRACTTABLESFROMPDF which extracts everything from the PDF it recognized as a table. This formula works on scanned documents.
Start by accessing your Google Drive account and upload the PDF document that contains the tables you want to extract. This makes the file accessible to Lido’s tools through your connected account.
Go to Lido and initiate a new spreadsheet from the files page. This spreadsheet will be where the extracted table data is stored and managed.
Add a fresh worksheet in your Lido spreadsheet by clicking on the plus icon located at the top left. This will be the workspace for entering your formula and viewing the extracted data.
In the new worksheet, enter the beginning of the EXTRACTTABLESFROMPDF formula into cell A1. This prepares the cell for the subsequent steps to link and process your PDF.
This step involves setting up a connection between Lido and your Google Drive by adding your credentials. Follow the on-screen instructions to authorize Lido, which enables access to your uploaded PDF.
After establishing the connection, continue the formula by typing a comma to proceed to the next part, then click “Select a file” to open a file picker dialog. This allows you to navigate through your Google Drive to find your PDF.
Locate and select the previously uploaded PDF within your Google Drive. This is the file from which the tables will be extracted.
The final argument of the EXTRACTTABLESFROMPDF formula determines where the extracted data should be placed. In this case, we are specifying that the data should be placed in worksheet Sheet1, starting at cell B2.
With the formula ready, right-click on cell A1 and choose “Run action” from the context menu. This command triggers the formula to start extracting table data from the PDF.
Finally, go to "Sheet1" to verify the results. Check that the tables from the PDF have been accurately extracted and are correctly displayed starting at cell B2.
Remember, this method specifically extracts tabular data; for non-tabular text, consider using methods 1 and 2.
We hope that you now have a better understanding of how to extract data from PDF.