Site icon FutureDecode

How to Convert Speech to Text Using Node js & AWS Transcribe Service

aws transcribe
CREATE A NODE JS APPLICATION FOR CONVERTING SPEECH TO USING NODE JS AND AWS TRANSCRIBE SERVICE STEP BY STEP GIDE 
First Install Node JS  in your system 
It comes with the NPM by default 
also you want to have a account on Aws 
you need to create a bucket on aws s3 where you can transfer the audio 
file,then transcribe get this file from S3 and convert into speech to text and again store on this same s3 bucket.
for that you need to attach some policy on s3 bucket.
click here to Setup S3 bucket policy 
 

Step 1: Setup Your Project

1. Initialize your Node.js project:

Following commond copy and past on your terminal

                mkdir my-audio-app
                cd my-audio-app
               npm init -y

2. Install required packages:

Following commond copy and past on your terminal

npm install express multer aws-sdk uuid dotenv cors

Step 2: Create the Application Files

Create the main server file:

Create a file named Server.js (or app.js as you prefer).

Set up environment variables:

Create a .env file in your project root directory and add the following:

Step 3: Write the Code

       // server.js
       const express = require(‘express’);
       const multer = require(‘multer’);
       const AWS = require(‘aws-sdk’);
       const { v4: uuidv4 } = require(‘uuid’);
       const cors = require(‘cors’);
       const fetch = require(‘node-fetch’); // Install node-fetch for fetching the transcript
       
       require(‘dotenv’).config();  // Load environment variables
       
       const app = express();
       const port = 3000;
       
       // CORS middleware
       app.use(cors());
       
       // AWS configuration
       AWS.config.update({
           region: process.env.AWS_REGION,
           accessKeyId: process.env.AWS_ACCESS_KEY_ID,
           secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
       });
       
       const s3 = new AWS.S3();
       const transcribe = new AWS.TranscribeService();
       
       // Middleware to parse JSON
       app.use(express.json());
       
       // Multer setup for in-memory storage
       const storage = multer.memoryStorage();
       const upload = multer({ storage: storage });
       
       // Route for uploading audio files
       app.post(‘/api/upload’, upload.single(‘audio’), (req, res) => {
           if (!req.file) {
               return res.status(400).send(‘No file uploaded.’);
           }
       
           const fileName = `${uuidv4()}-${req.file.originalname}`;
           const uploadParams = {
               Bucket: ‘your bucket name’,
               Key: fileName,
               Body: req.file.buffer,
               ContentType: req.file.mimetype
           };
       
           s3.upload(uploadParams, (err, data) => {
               if (err) {
                   console.error(‘Error uploading file:’, err);
                   return res.status(500).send(‘Failed to upload audio file.’);
               }
               res.status(200).json({ fileUrl: data.Location });
           });
       });
       
       // Route for starting transcription job for speech to text
       app.post(‘/api/transcribe’, (req, res) => {
           const fileUrl = req.body.fileUrl;
       
           if (!fileUrl) {
               return res.status(400).send(‘fileUrl is required.’);
           }
       
           const jobName = `YourJobName-${Date.now()}`;
           const jobParams = {
               TranscriptionJobName: jobName,
               LanguageCode: ‘en-US’,
               Media: {
                   MediaFileUri: fileUrl
               },
               MediaFormat: ‘wav’,
               OutputBucketName: ‘your bucket name’
           };
       
           transcribe.startTranscriptionJob(jobParams, (err, data) => {
               if (err) {
                   console.error(‘Error starting transcription job:’, err);
                   return res.status(500).send(‘Failed to start transcription job.’);
               }
       
               const checkJobStatus = () => {
                   transcribe.getTranscriptionJob({ TranscriptionJobName: jobName }, (err, data) => {
                       if (err) {
                           console.error(‘Error getting transcription job status:’, err);
                           return res.status(500).send(‘Failed to get transcription job status.’);
                       }
       
                       if (data.TranscriptionJob.TranscriptionJobStatus === ‘COMPLETED’) {
                           const transcriptUri = data.TranscriptionJob.Transcript.TranscriptFileUri;
                           fetch(transcriptUri)
                               .then(response => response.json())
                               .then(json => {
                                   res.status(200).json({ transcript: json.results.transcripts[0].transcript });
                               })
                               .catch(error => {
                                   console.error(‘Error fetching transcript:’, error);
                                   res.status(500).send(‘Failed to fetch transcript.’);
                               });
                       } else if (data.TranscriptionJob.TranscriptionJobStatus === ‘FAILED’) {
                           res.status(500).send(‘Transcription job failed.’);
                       } else {
                           setTimeout(checkJobStatus, 5000); // Check status again in 5 seconds
                       }
                   });
               };
       
               checkJobStatus(); // Start polling for job status
           });
       });
       
       // Start the server
       app.listen(port, () => {
           console.log(`Server running at http://localhost:${port}/`);
       });
       

CODE INFO  :

This Node.js application sets up an Express server for handling audio file uploads and transcription using AWS services for speech to text. Here’s a breakdown of the code:

  1. Dependencies and Configuration: The code uses Express for server functionality, Multer for handling file uploads, AWS SDK for interacting with AWS services (S3 for file storage and Transcribe for transcription), uuid for generating unique file names, and node-fetch to retrieve transcription results. Environment variables are loaded from a .env file using dotenv.

  2. CORS and Middleware: CORS middleware allows cross-origin requests. JSON middleware parses incoming JSON data.

  3. File Upload Endpoint (/api/upload): Multer stores uploaded audio files in memory. The file is then uploaded to an S3 bucket with a unique name. The S3 URL of the uploaded file is returned to the client.

  4. Transcription Endpoint (/api/transcribe): This endpoint starts a transcription job using AWS Transcribe. It polls the job status until it is completed or fails. Upon completion, it fetches the transcript from the S3 bucket and returns it to the client.

  5. Server Setup: The server listens on port 3000 and logs a message confirming its running status.

This setup enables uploading audio files, triggering transcription, and retrieving the speech to text transcription results 

Notes

  1. Security: Never hardcode AWS credentials in your code. Use environment variables as shown in this example. Additionally, consider using IAM roles and policies for better security.

  2. Error Handling: Ensure you handle various error cases, such as invalid file types, AWS service errors, and network issues.

  3. Deployment: When deploying this application, make sure to configure environment variables and ensure that your AWS credentials are securely managed.

Exit mobile version