This documents my audio setup and how I use different tools to record spoken articles.
I have a Dell Latitude D600 running OpenBSD 3.5 (-current as of 6/1/2005). I use a Radio Shack clip-on condenser microphone (it was about $11), and Sony over-the-ear bud-style headphones.
I use sox, a fairly common "swiss-army" knife for processing digital audio, the tools from the Ogg Vorbis toolkit and supporting libraries, and flac.
I use the following bash function to record an audio file:
soxrec() { sox -t ossdsp -w -s -c 2 -r 44100 /dev/sound -t raw -w -s -c 1 -r 44100 "$@" } soxrec Article_name-sect.cdda
The sox command has the general form "sox [input options] input-file [output-options] output-file. The options above require some explanation:
The ".cdda" extension is because of the resemblance to CD digital audio; this may not literally be the case, but these files are ultimately temporary and it's just a convention I'm using.
I record the article paragraph by paragraph, numbering each section (the section number is given in its "edit" link) as I go. So, for example, the first paragraph of the introduction is Article_name-00a.cdda, the second paragraph is Article_name-00b.cdda, etc.
Since the resulting sound files are raw audio data (i.e. they lack a descriptive header like a .wav file or any form of compression or encapsulation) they can be simply concatenated using cat.
The sox command also allows you to manipulate the files in various ways, I use the "trim" effect, for example, to cut the file up on time-boundaries, such as when only part of it needs to be re-recorded or I've left too long a pause at the end.
I use the "compand" filter to level and normalize the audio. First, I concatenate the raw files into a .wav file, as shown below, with the "stat" filter. The "stat" filter will print a Volume Adjustment: recommendation for normalizing the audio--that is, making it as loud as possible without "clipping." I use that as the third argument to the compand filter. The argument list for "compand" is then "0.1,0.3 -60,-60,-30,-15,-20,-12,-4,-8,-2,-7 volume". I also find that applying a lowpass filter with an argument of 4000 Hz helps to soften sibilants and reduce some noise.
Anyway, to produce a .wav file (this makes it slightly more convenient to encode, though the encoders all accept raw audio data):
cat Article_name-*.cdda | sox -t raw -w -s -c 1 -r 44100 - -t wav -w -s -c 1 -r 44100 Article_name.wav lowpass 4000 compand 0.1,0.3 -60,-60,-30,-15,-20,-12,-4,-8,-2,-7 vol
I always listen to the full resulting .wav file to make sure it sounds okay before encoding. I usually decide to re-record one or two parts.
I use oggenc to encode the .wav file with the appropriate bitrate (I'm currently using an average bitrate of 64 kb/s, not the project's recommended 48 kb/s, because it sounds better to me).
oggenc -d 'YYYY-MM-DD' \ -a 'Demi @ Wikipedia' \ -t 'Article title' \ -c source="From Wikipedia, the free encyclopedia: http://www.wikipedia.org" \ -c copying="Licensed under GNU Free Documentation License: http://www.fsf.org/licensing/licenses/fdl.txt" \ -c article="http://en.wikpedia.org/wiki/Article_title" \ -c version="HH:MI, YYYY Mon DD UTC" \ -b 64 \ -o Article_title.ogg \ Article_title.wav
I want to make an attempt to keep article readings up-to-date as the articles changes (or at least if they change significantly). So, I keep the individual section files around, compressed with the flac utility:
flac --channels=1 --bps=16 --sample-rate=44100 --sign=signed --endian=little Article_title-*.cdda
Then, I remove the *.cdda and .wav files. When an article changes, I can re-record the changed sections, concatenate them back together and re-encode.
To re-produce the appropriate raw audio stream for input to sox to make a .wav file, as above, I do something like:
flac -dc --force-raw-format --endian=little --sign=signed *.flac | ...
Note that I'm using an Intel computer: your endianness may vary.
This documents my audio setup and how I use different tools to record spoken articles.
I have a Dell Latitude D600 running OpenBSD 3.5 (-current as of 6/1/2005). I use a Radio Shack clip-on condenser microphone (it was about $11), and Sony over-the-ear bud-style headphones.
I use sox, a fairly common "swiss-army" knife for processing digital audio, the tools from the Ogg Vorbis toolkit and supporting libraries, and flac.
I use the following bash function to record an audio file:
soxrec() { sox -t ossdsp -w -s -c 2 -r 44100 /dev/sound -t raw -w -s -c 1 -r 44100 "$@" } soxrec Article_name-sect.cdda
The sox command has the general form "sox [input options] input-file [output-options] output-file. The options above require some explanation:
The ".cdda" extension is because of the resemblance to CD digital audio; this may not literally be the case, but these files are ultimately temporary and it's just a convention I'm using.
I record the article paragraph by paragraph, numbering each section (the section number is given in its "edit" link) as I go. So, for example, the first paragraph of the introduction is Article_name-00a.cdda, the second paragraph is Article_name-00b.cdda, etc.
Since the resulting sound files are raw audio data (i.e. they lack a descriptive header like a .wav file or any form of compression or encapsulation) they can be simply concatenated using cat.
The sox command also allows you to manipulate the files in various ways, I use the "trim" effect, for example, to cut the file up on time-boundaries, such as when only part of it needs to be re-recorded or I've left too long a pause at the end.
I use the "compand" filter to level and normalize the audio. First, I concatenate the raw files into a .wav file, as shown below, with the "stat" filter. The "stat" filter will print a Volume Adjustment: recommendation for normalizing the audio--that is, making it as loud as possible without "clipping." I use that as the third argument to the compand filter. The argument list for "compand" is then "0.1,0.3 -60,-60,-30,-15,-20,-12,-4,-8,-2,-7 volume". I also find that applying a lowpass filter with an argument of 4000 Hz helps to soften sibilants and reduce some noise.
Anyway, to produce a .wav file (this makes it slightly more convenient to encode, though the encoders all accept raw audio data):
cat Article_name-*.cdda | sox -t raw -w -s -c 1 -r 44100 - -t wav -w -s -c 1 -r 44100 Article_name.wav lowpass 4000 compand 0.1,0.3 -60,-60,-30,-15,-20,-12,-4,-8,-2,-7 vol
I always listen to the full resulting .wav file to make sure it sounds okay before encoding. I usually decide to re-record one or two parts.
I use oggenc to encode the .wav file with the appropriate bitrate (I'm currently using an average bitrate of 64 kb/s, not the project's recommended 48 kb/s, because it sounds better to me).
oggenc -d 'YYYY-MM-DD' \ -a 'Demi @ Wikipedia' \ -t 'Article title' \ -c source="From Wikipedia, the free encyclopedia: http://www.wikipedia.org" \ -c copying="Licensed under GNU Free Documentation License: http://www.fsf.org/licensing/licenses/fdl.txt" \ -c article="http://en.wikpedia.org/wiki/Article_title" \ -c version="HH:MI, YYYY Mon DD UTC" \ -b 64 \ -o Article_title.ogg \ Article_title.wav
I want to make an attempt to keep article readings up-to-date as the articles changes (or at least if they change significantly). So, I keep the individual section files around, compressed with the flac utility:
flac --channels=1 --bps=16 --sample-rate=44100 --sign=signed --endian=little Article_title-*.cdda
Then, I remove the *.cdda and .wav files. When an article changes, I can re-record the changed sections, concatenate them back together and re-encode.
To re-produce the appropriate raw audio stream for input to sox to make a .wav file, as above, I do something like:
flac -dc --force-raw-format --endian=little --sign=signed *.flac | ...
Note that I'm using an Intel computer: your endianness may vary.