Pretty new to scripting in AE so take this with a grain of salt.
Maybe a script that checks the width of a text layer at each frame and adds a null with Timeline Markers at each width change. Then the script could add your sound footage at each marker that was placed on the null. You'd run the script after you've got your text animation.
It's been 5 days since you've posted so you might have already figured out a solution, but I thought I'd chime in with some type of input.