Vanilla RNN & Seq2seq & attention

IT/음성인식

Vanilla RNN & Seq2seq & attention

sarah0518 2023. 4. 5. 20:03

728x90

Vanilla RNN

Vanilla RNN에서 업데이트해야되는 3개의 parameter

1. Wxh

2. Whh

3. Who

Vanilla RNN의 단점

영향을 주는 정도가 과거로 갈 수록 점점 작아져

단기 메모리를 갖게 됨

해결책) attention

RNN, LSTM, GRU의 한계점

길이가 다른 sequence의 변환이 불가능함

해결책 ) Seq 2 Seq

예시) 번역

Seq 2 Seq with Attention

simple한 구조 표현으로 아래와 같이 나타낼 수 있음

Context vector를 사용한다는 게 주요 point임

Context vector

input token을 특정차원의 벡터 space로 변환한 벡터임

** 음성인식에서는 frame을 50으로 가져감

** encoder와 decoder의 시간축은 다름 (= 길이가 다른 sequence의 변환 가능)

** 종료시점은 </s>(sentence end)가 예측되는 시점임

단점) Information bottleneck 발생

: 입력 sequence의 모든 정보를 fixed size hidden state에 encoding.

decoder의 매시점에서 fixed size hiddent state에 encoding 된 것을

decoder의 t 시점에 동일하게 적용함

해결책) attention

** attention은 decoder의 time t시점에서 매번 다른 hidden state를 적용함

Attention

dot product(내적)의 결과 encoder 벡터와 decoder 벡터의

내적이므로 하나의 값으로 나타낼 수 있음

(내적하는 이유는 cosine similarity를 구하기 위해서)

그 하나의 값은 가중치로 표현이 될 수 있으며

그 가중치를 각 encoder vector에 곱해줌으로써

→ decoder의 time t시점에서 매번 다른 hidden state를 적용함

** 단, encoder와 decoder를 내적을 하므로

** encoder의 마지막 hidden layer의 node수 = decoder의 마지막 hidden layer의 node수

attention distribution의 값이 I에서 가장 크다는 것을 알 수 있음

→ output은 Je가 될 확률이 높음

** 음성은 시계열 패턴이므로, linear attention임

→ 음성 sequence와 matching되는 output은 매우 선형적으로 attention되는 특징이 있음

비교 예시) 공원 갑자기 점프하는 강아지 image에 대한 caption은 linear attention X

Attention의 Q, K, V

Query: 위의 그림에서 query는 4번째 decoder context vector에 적용할 encoder context vector를 구하고 싶어요

Key: decoder vector와 내적해야하는 encoder vector 들

Value: Weighted sum에 적용되는 vector set

self-attention: encoder 또는 decoder 내에서만 attention하는 것

Autoregressive:

첫번째 단어가 출력되고 그 출력된 결과가 다시 input으로 들어와

2번째 단어 출력시 영향을 주는 것

728x90

저작자표시 변경금지

'IT > 음성인식' 카테고리의 다른 글

Multi-head attention (0)	2023.05.03
Attention의 Q, K, V와 Transformer (0)	2023.04.12
RNN (0)	2023.03.29
Feed Forward Neural Net (0)	2023.03.22
입/출력 end 복잡도 분석 (0)	2023.03.15

현재글Vanilla RNN & Seq2seq & attention

sarah0518