Speech Activity

Speech activity is used to perform AI based audio analysis. In this activity, you can perform speech to text, text to speech and speaker verification.

Using Speech Activity

  1. In the Canvas Tools pane, click Process Components to expand and view the associated activities.
  2. Drag the Application activity and drop on to the Flowchart designer area on the Canvas.

 

 

  1. In the Application Type list, select CognitiveApps.
  2. In the Select An Application list, select application.
  3. Double-click the Application activity to add Speech activity as cognitive activities works inside the application activity.

 

 

  1. In the Canvas Tools pane, click Cognitive Services to expand the tool and view the associated activities.
  2. Drag the Speech activity and drop on to the Flowchart designer area on the Canvas.

 

 

  1. In the Provider list, select the provider name. By default, provider is set to Google.



    Automation Studio supports two service providers for cognitive speech API.
    • Microsoft
    • Google

 

  1. In the Service list, select the service which you want to use for cognitive speech API.



    Based on your selection of provider, service list reflects the supported APIs.  If you have selected Google as your provider, following API services will reflect in the service list:
    • Speech To Text: This API is used to convert audio to text.
    • Text To Speech: This API is used to convert text to audio.

      If you have selected Microsoft as your provider, following API services will reflect in the service list:
    • Speech To Text: This API is used to convert audio to text.
    • Speaker Verification: This API is used to verify an audio input with pre-enrolled profile.
  2. Click the  (Settings) icon. The Configuration window appears.


     
    1. API URL: Provide API URL based on the provider and service selected by you in the Vision activity.
    2. File Path: Provide file path for analyzing speech/text. This is an input for your API. You can provide file path by creating argument in the Argument pane, and then select the created argument in the File Path list.
    3. Text Language: This is an optional parameter and applicable for OCR and text detection visual features. You can keep it blank for auto detect. You can define text in following format, such as, en-US. en depicts English and US depicts United States. As a whole, OCR detects US English.
    4. Output: Provide the JSON file to store the output. This is an output of your API. You can provide the JSON file for output by creating argument in the Argument pane and then select the created argument in the Output list.
    5. Time Interval: This field is used to trim the long audio into short audio clip for analyzing. The format to define time is MM:SS (Minutes:Seconds). Appears if Service selected is Speech to Text, and Speaker Verification.
    6. Verification Profile Id: This is an optional field, applicable for Speaker Verification API. Provide pre-enrolled profile ID for Speaker Verification API. Appears if Service selected is Speaker Verification.
    7. Audio Output Path: Provide the audio output path to store the output which is in audio format. You can provide the file for output by creating argument in the Argument pane and then select the created argument in the Output list. The string format for output argument is defined in D:\audio\sample.aw format.
      The above mentioned parameters may vary as per your selection.
  3. Click Close to confirm your changes.

Speech Properties

The properties of Speech activity are listed in the following table and can be edited in the Properties grid on the right pane.

 

Property Name

Usage

Control Execution

Ignore Error

When this option is set to Yes, the application ignores any error while executing the activity.

If set to NA, it bypasses the exception (if any) to let the automation flow continue; however, it marks the automation status as failure, in case of an exception.

By default, this option is set to No.

Delay

Wait After (ms)

Specify the time delay that must occur after the activity is executed. The value must be in milliseconds.

Wait Before (ms)

Specify the time delay that must occur before the activity is executed. The value must be in milliseconds.

Misc

ApplicationID

 It is internally created and managed by Automation Studio itself.

Breakpoint

Select this option to mark this activity as the pause point while debugging the process. At this point, the process freezes during execution allowing you to examine if the process is functioning as expected.

In large or complex processes, breakpoints help in identifying the error, if any.

Commented

Select this option to mark this activity as inactive in the entire process. When an activity is commented, it is ignored during the process execution.

DisplayName

The display name of the activity in the flowchart designer area. By default, the name is set as Speech. You can change the name as required.

EndTime

Specify the end time for the audio clip. The format to specify the end time is MM:SS.

FilePath

Specify the file path for the input. Provided file path is displayed in the configuration window Input parameter and vice versa.

InputText

Specify the file path for the input. Provided file path is displayed in the configuration window Input Text parameter and vice versa.

ResultAudioPath

Specify the audio path to store the audio output. Provided audio path is displayed in the configuration window Audio Output Path parameter and vice versa.

ResultJson

Specify the JSON file path to store the JSON output. Provided JSON file path is displayed in the configuration window Output parameter and vice versa.

SelectedService

The name of the service selected in the Speech activity box. You can change the name as required.

StartTime

Specify the start time for the audio clip. The format to specify the start time is MM:SS.

TextLanguages

Specify the text language in this field. The format to define the text language is en-US. Alternatively, you can enter the text language in the Text Language field of the Configuration window. The text language entered in the Properties grid reflects in the Configuration window and vice versa..

Url

Specify the API URL in the field. Alternatively, you can specify the API URL by selecting the argument holding the API URL, in the API URL list of the Configuration window. The API URL specified in the Properties grid reflects in the Configuration window and vice versa.

VerificationProfileId

Specify the pre-enrolled verification ID for speaker verification. The verification profile Id is displayed in the Verification Profile Id parameter of the Configuration window.

 

See Also

Languages supported:  

Google Text To Speech API:  https://cloud.google.com/text-to-speech/docs/voices  

Google Speech To Text API:  https://cloud.google.com/speech-to-text/docs/languages  

Microsoft Speaker Verification API: https://westus.dev.cognitive.microsoft.com/docs/services/563309b6778daf02acc0a508/operations/563309b7778daf06340c9652  

Microsoft Speech To Text API:  https://docs.microsoft.com/en-in/azure/cognitive-services/speech-service/language-support  

 

For more information:

Google: https://cloud.google.com/text-to-speech/docs/reference/rest/v1beta1/text/synthesize  

https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize

https://cloud.google.com/speech-to-text/docs/languages  

Microsoft: https://westus.dev.cognitive.microsoft.com/docs/services/563309b6778daf02acc0a508/operations/56406930e597ed20c8d8549c