This work studies the use of cloaking in transducer-based speech recognition to build a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing static masking, in which the same attention mask is applied in each frame, with split masking, in which the attention mask for each frame is determined by cutoff boundaries, in terms of recognition accuracy and response time. We then explore the use of variable masking, where attention masks are sampled from the target distribution at training time, to build models that can operate in different configurations. Finally, we investigate how a single configurable model can be used to perform both first-pass stream recognition and second-pass audio re-rescue. Experiments show that segmented masking achieves better accuracy against the latency trade-off than static masking, with or without FastEmit. We also show that variable masking improves accuracy by up to 8% proportionally in the audio recapture scenario.