How to use Google Gemini Ai for extracting text from images and PDFs and then convert that text into JSON using Node.js

futuredecode.com

1 month ago

1. Set Up Google Gemini API

Google Gemini is a set of models that typically requires access to Google Cloud’s Vision AI or Document AI. Follow these steps to set up the Google Vision API or Document AI:

a. Create a Google Cloud Project:

Go to the Google Cloud Console.
Create a new project.

b. Enable the Vision API or Document AI API:

In the Google Cloud Console, navigate to the APIs & Services > Library.
Search for Vision API or Document AI API.
Click Enable.

c. Set Up Authentication:

In the Cloud Console, go to APIs & Services > Credentials.
Click Create credentials and select Service account.
Follow the prompts to create a new service account and download the JSON key file to your computer.

2. Install Necessary Node.js Libraries

You’ll need to install the following libraries to interact with Google Cloud’s Vision API or Document AI API from a Node.js application:

npm install express multer @google-cloud/vision

3. Write Node.js Code to Extract Text and Convert to JSON

Here’s an example script using Google Cloud’s Vision API to extract text from images and PDFs:

// Import necessary libraries

const express = require(“express”);

const multer = require(“multer”);

const vision = require(“@google-cloud/vision”);

const fs = require(“fs”);

// Initialize the Express app

const app = express();

const port = 3000;

// Set up Multer for file uploads

const upload = multer({ dest: “uploads/” });

// Set up Google Cloud Vision client with service account credentials

const client = new vision.ImageAnnotatorClient({

keyFilename: “path/to/your/service-account-key.json”,

});

// Function to extract text from an image

async function extractTextFromImage(imagePath) {

const fileBuffer = fs.readFileSync(imagePath);

const imageBase64 = fileBuffer.toString(“base64”);

const [result] = await client.textDetection({

image: { content: imageBase64 },

});

const detections = result.textAnnotations;

return detections[0] ? detections[0].description : “”;

}

// Function to extract text from a PDF

async function extractTextFromPDF(pdfPath) {

const fileBuffer = fs.readFileSync(pdfPath);

const pdfBase64 = fileBuffer.toString(“base64”);

const [result] = await client.documentTextDetection({ content: pdfBase64 });

const fullTextAnnotation = result.fullTextAnnotation;

return fullTextAnnotation ? fullTextAnnotation.text : “”;

}

// API endpoint to upload files and extract text

app.post(“/upload”, upload.single(“file”), async (req, res) => {

try {

if (!req.file) {

return res.status(400).send(“No file uploaded.”);

}

const filePath = req.file.path;

let extractedText;

if (req.file.mimetype.startsWith(“image/”)) {

extractedText = await extractTextFromImage(filePath);

} else if (req.file.mimetype === “application/pdf”) {

extractedText = await extractTextFromPDF(filePath);

} else {

return res

.status(400)

.send(“Invalid file type. Only images and PDFs are supported.”);

}

// Delete the uploaded file from the server after processing

fs.unlinkSync(filePath);

// Return extracted text as JSON

res.json({ extractedText });

} catch (error) {

console.error(“Error processing file:”, error);

res.status(500).send(“An error occurred while processing the file.”);

}

});

// Start the server

app.listen(port, () => {

console.log(`Server is running on http://localhost:${port}`);

});

Explanation:

Google Vision API Client: Used to set up the connection with the Vision API using the credentials.
extractTextFromImage: Reads an image file and converts it to a base64 string. Then, it uses the Vision API to detect text in the image.
extractTextFromPDF: Reads a PDF file and converts it to a base64 string. Then, it uses the Vision API to perform text detection on the document.
JSON Conversion: After extracting the text, it’s stored in a JSON object and written to a file (output.json).

Important Notes:

Service Account Key: Ensure the path to your service account key file (path/to/your/service-account-key.json) is correct.
Quota and Costs: Be aware of the costs and quotas associated with using Google Cloud APIs. Make sure your Google Cloud billing is set up correctly.
Google Vision vs. Document AI: For images, use Google Vision API; for more complex documents, consider using Google Document AI for better results.

If you need further customization or have specific questions, feel free to ask!