How to Convert Speech to Text Using Node js & AWS Transcribe Service

futuredecode.com

3 months ago

CREATE A NODE JS APPLICATION FOR CONVERTING SPEECH TO USING NODE JS AND AWS TRANSCRIBE SERVICE STEP BY STEP GIDE

First Install Node JS in your system

It comes with the NPM by default

also you want to have a account on Aws

you need to create a bucket on aws s3 where you can transfer the audio

file,then transcribe get this file from S3 and convert into speech to text and again store on this same s3 bucket.

for that you need to attach some policy on s3 bucket.

click here to Setup S3 bucket policy

Step 1: Setup Your Project

1. Initialize your Node.js project:

Following commond copy and past on your terminal

mkdir my-audio-app
cd my-audio-app
npm init -y

2. Install required packages:

Following commond copy and past on your terminal

npm install express multer aws-sdk uuid dotenv cors

Step 2: Create the Application Files

Create the main server file:

Create a file named Server.js (or app.js as you prefer).

Set up environment variables:

Create a .env file in your project root directory and add the following:

Step 3: Write the Code

// server.js

const express = require(‘express’);

const multer = require(‘multer’);

const AWS = require(‘aws-sdk’);

const { v4: uuidv4 } = require(‘uuid’);

const cors = require(‘cors’);

const fetch = require(‘node-fetch’); // Install node-fetch for fetching the transcript

require(‘dotenv’).config(); // Load environment variables

const app = express();

const port = 3000;

// CORS middleware

app.use(cors());

// AWS configuration

AWS.config.update({

region: process.env.AWS_REGION,

accessKeyId: process.env.AWS_ACCESS_KEY_ID,

secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,

});

const s3 = new AWS.S3();

const transcribe = new AWS.TranscribeService();

// Middleware to parse JSON

app.use(express.json());

// Multer setup for in-memory storage

const storage = multer.memoryStorage();

const upload = multer({ storage: storage });

// Route for uploading audio files

app.post(‘/api/upload’, upload.single(‘audio’), (req, res) => {

if (!req.file) {

return res.status(400).send(‘No file uploaded.’);

}

const fileName = `${uuidv4()}-${req.file.originalname}`;

const uploadParams = {

Bucket: ‘your bucket name’,

Key: fileName,

Body: req.file.buffer,

ContentType: req.file.mimetype

};

s3.upload(uploadParams, (err, data) => {

if (err) {

console.error(‘Error uploading file:’, err);

return res.status(500).send(‘Failed to upload audio file.’);

}

res.status(200).json({ fileUrl: data.Location });

});

// Route for starting transcription job for speech to text

app.post(‘/api/transcribe’, (req, res) => {

const fileUrl = req.body.fileUrl;

if (!fileUrl) {

return res.status(400).send(‘fileUrl is required.’);

}

const jobName = `YourJobName-${Date.now()}`;

const jobParams = {

TranscriptionJobName: jobName,

LanguageCode: ‘en-US’,

Media: {

MediaFileUri: fileUrl

MediaFormat: ‘wav’,

OutputBucketName: ‘your bucket name’

};

transcribe.startTranscriptionJob(jobParams, (err, data) => {

if (err) {

console.error(‘Error starting transcription job:’, err);

return res.status(500).send(‘Failed to start transcription job.’);

}

const checkJobStatus = () => {

transcribe.getTranscriptionJob({ TranscriptionJobName: jobName }, (err, data) => {

if (err) {

console.error(‘Error getting transcription job status:’, err);

return res.status(500).send(‘Failed to get transcription job status.’);

}

if (data.TranscriptionJob.TranscriptionJobStatus === ‘COMPLETED’) {

const transcriptUri = data.TranscriptionJob.Transcript.TranscriptFileUri;

fetch(transcriptUri)

.then(response => response.json())

.then(json => {

res.status(200).json({ transcript: json.results.transcripts[0].transcript });

})

.catch(error => {

console.error(‘Error fetching transcript:’, error);

res.status(500).send(‘Failed to fetch transcript.’);

});

} else if (data.TranscriptionJob.TranscriptionJobStatus === ‘FAILED’) {

res.status(500).send(‘Transcription job failed.’);

} else {

setTimeout(checkJobStatus, 5000); // Check status again in 5 seconds

}

});

};

checkJobStatus(); // Start polling for job status

});

// Start the server

app.listen(port, () => {

console.log(`Server running at http://localhost:${port}/`);

});

CODE INFO :

This Node.js application sets up an Express server for handling audio file uploads and transcription using AWS services for speech to text. Here’s a breakdown of the code:

Dependencies and Configuration: The code uses Express for server functionality, Multer for handling file uploads, AWS SDK for interacting with AWS services (S3 for file storage and Transcribe for transcription), uuid for generating unique file names, and node-fetch to retrieve transcription results. Environment variables are loaded from a .env file using dotenv.
CORS and Middleware: CORS middleware allows cross-origin requests. JSON middleware parses incoming JSON data.
File Upload Endpoint (/api/upload): Multer stores uploaded audio files in memory. The file is then uploaded to an S3 bucket with a unique name. The S3 URL of the uploaded file is returned to the client.
Transcription Endpoint (/api/transcribe): This endpoint starts a transcription job using AWS Transcribe. It polls the job status until it is completed or fails. Upon completion, it fetches the transcript from the S3 bucket and returns it to the client.
Server Setup: The server listens on port 3000 and logs a message confirming its running status.

This setup enables uploading audio files, triggering transcription, and retrieving the speech to text transcription results

Notes

Security: Never hardcode AWS credentials in your code. Use environment variables as shown in this example. Additionally, consider using IAM roles and policies for better security.
Error Handling: Ensure you handle various error cases, such as invalid file types, AWS service errors, and network issues.
Deployment: When deploying this application, make sure to configure environment variables and ensure that your AWS credentials are securely managed.