Here are input formats for S2T/ASR, and the other Speech source APIs (Real-Time API Series). The file must be Base64 encoded used as one of the required parameters.
WAV, M4A, MP3, AAC, FLAC, OGG, G.711, G.722, Speex, Opus
We suggest 48KHz, mono, M4A Base64 encoded for the smallest file, and faster response times. But any of the formats listed can be used.
Output currently is always a Base64 Encoded MP3, mono, 48KHz, 32 Bit.
Use for: Real-Time: S2ST, S2STT