음성신호처리 개론

IT/음성신호처리

음성신호처리 개론

sarah0518 2023. 3. 9. 22:40

728x90

음성 신호 기본 개념

vocal cord: 성대
amplitude: 진폭
frequency: 주파수
resonant: 공진주파수
Fourier 변환: 신호에서 sin주기를 나타내는 신호가 몇개가 있는지 분해해 내는 것
시간축에서 주기가 P일 때, 주파수 도메인에서는 주기가 1/p임

source를 입력받아 filter를 거쳐 speech sequence 형태로 바꾸는 것

filter는 time varying 특성이 있는데, 가나다를 발음 할 때마다 volcal tract과 혀의 위치가 바뀌기 때문임 (즉, filter = vocal tract)

Analysis-synthesis system이란?

▫ Analyze the speech production mechanism.

▫ Separate the source and the filter.

▫ Parameterize the source and the filter.

▫ Transmit the parameters.

▫ Synthesize the signal by using the parameters (파라미터를 다시 signal로 합성하는 것)

3가지 형태의 source

noisy(unvoiced) : 마찰음 (무질서한 sound)
periodic(voiced)
impulsive: 파열음

Glottal cycle의 3가지 단계

Closed → Open → Return

** return phase의 특징: 매우 빠르게 일어남

Pitch period: periodic source의 주기 (one cycle)

Pitch period의 역수를 pitch 또는 fundamental frequency 라고 함 (기호 F0)

Pitch range: 60~400 Hz

Lossless Tube Model

vocal tract을 아래와 같이 simple하게 표현함

• Assume the vocal tract as a lossless nonuniform tube

( non-uniform tube로세로의 높이가 상이함)

• The nonuniform tube can be approximated as concatenation of N uniform tubes.

(가로의 너비는 동일함)

• Apply the wave equation and derive a transfer function(filter라고 생각하면 됨) from the glottis to the lips

Speech sounds의 예시

Speech production 매커니즘

source는 성대의 filter와 vocal tract의 linear filter를 거쳐 음성이 만들어 짐

Short-time Fourier transform

short-time fourier transform이 필요한 이유

이론적으로 Fourier transform는 음의 무한대에서 양의 무한대의 입력을 변환하는 것인데,

사람의 음성은 시간에 따라 바뀌므로, 각 발화시점마다의 주파수 특성을 구분하는 것이 필요함

20-30ms단위로 자름

→ 근육이 순간적으로 바뀌지 않음 “가”는 “ㄱ”+”ㅏ”로 발음이 됨 (20~30ms단위로)

window를 sliding하면서 이동시킴
주로 Hamming window를 사용함

→ Hamming window gives good trade-off between mainlobe width and sidelobe attenuation.

가나다를 발음할때의 signal (위: waveform, 아래: spectrogram)

Pitch period (1/F0 = 주기)

F0 = Pitch = Fundamental frequency

** Pitch period는 사람의 개개인의 특성에 주로 좌우 됨 (발음보다는)

Vowels의 spectrum 비교

(a)와 (b)는 같은 발음을 다른사람이 한 것이고 (다른 pitch)

(b)와 (c)는 다른 발음을 같은 사람이 한 것임 (같은 pitch)

Cepstral analysis

주파수 도메인에서 음성을 분석한다는 것은 성도(vocal tract)에 해당하는 필터

+ 성대에 해당하는 필터가 있고 각각의 parameter를 분리해서 찾는 것임

a를 b로 퓨리에 변환한다음

b에 로그를 씌우면 log(ab) → log(a) + log(b)로 변환됨 이게 c가 되는 과정임

low(성도)는 어떤 발음을 했냐를 볼때 사용할 수 있고,

High(성대)는 누가 발음을 했냐(speaker recognition)을 분석할 때 사용하는 영역임

Wideband Spectrogram vs. Narrow-Band Spectrogram

20~30ms 기준으로 볼 때,

Wideband Spectrogram로 하면 세로방향으로는 차이가 없으나, x축으로는 차이가 남

시간에 따라 상세히 보고, 주파수에 따라 smoothing하게 보는 것임

Narrow-Band Spectrogram

Narrow-Band Spectrogram 로 하면 가로방향으로는 차이가 없으나, 세로축으로는 차이가 남

시간에 따라 smoothing 하게 보고 주파수에 따라 상세히 보는 것임

728x90

저작자표시 변경금지

'IT > 음성신호처리' 카테고리의 다른 글

wave basic (0)	2023.04.06
LTI Systems과 여러종류의 digital filter (0)	2023.03.30
Discrete-Time Fourier Transform (DTFT) (0)	2023.03.23
z-Transform (0)	2023.03.22
음성신호처리 개론2 (0)	2023.03.16

현재글음성신호처리 개론

sarah0518