keypoint detection - Harris, SIFT, SURF, LIFT

IT/영상처리분석

keypoint detection - Harris, SIFT, SURF, LIFT

sarah0518 2022. 4. 18. 19:59

728x90

Keypoint detection을 사용하여

3차원 복원
occulusion복원
카메라 앵글
image stitching(panorama)
객체인식

등을 파악할 수 있다.

Kepoint detectors

1. Harris detector

2. SIFT detector

3. SURF detector

4. LIFT detector

Harris detector

- edge의 경우 edge direction방향으로 움직이게 되면 변화가 없음

- corner의 경우 필터를 움직이게 되면 큰 변화가 있는 것을 파악할 수 있음

- 또한 corner는 2개의 direction이 다른 edge가 만나는 것이므로

2개이상의 방향으로 gradient가 크게 변한다면 corner라고 볼 수 있다.

→ 이러한 특성을 통해 keypoint를 검출함

Harris detector 원리

window를 이동시키면서 intensity의 변화량을 계산하는 것

수식으로 정리하면 아래의 E(u,v)가 0이면 flat, E(u,v)가 큰값이면 corner라고 판단함

Harris corner detector 수식정리

the eigenvalues (λ𝟏, λ𝟐) of the matrix M can be
obtained to compare the magnitudes of the eigenvalues(화살표의 길이).

λ는 변화량의 큰 정도를 나타내는 값으로, 타원에서 장축, 단축으로 이해하면 됨

**참고로 Mv=λv에서 v는 화살표의 방향을, λ는 화살표의 길이를 나타냄

(λ𝟏, λ𝟐) 에서

• If both are small, it is a flat surface,
• if only one is large, it is edge, and
• if both are large, it is a corner.

Lambda값이 크다 → 코너를 뜻함

R에서는

• If 𝑹 > 𝟎 : corner,
• if 𝑹 < 𝟎 : edge,
• If |𝑹| is small : a flat region

lambda1	lambda2	수식 (k=0.04)	값 (R = corner response)
10	10	1010 - 0.04(10+10)^2	84
10	0.1	10*0.1 - 0.04(10+0.1)^2	<0
0.1	0.1	0.1*0.1 - 0.04(0.2)^2	≒10

오른쪽 그림에서 원표시된 것이 코너를 뜻함, 원의크기는 코너 response의 크기(코너의 scale)

Harris detector에서 고려할 점(문제점)

scale 변화에 따라 코너가 edge처럼 보일 수 있으므로

scale 변환을 해줘야함

→ Scale invariant feature transform (SIFT)

SIFT(Scale Invariant Feature Transformation)

위의 harris detector의 문제점도 해결하고

keypoint detection을 통해

image matching(uniqueness, distinctiveness, similarity)을 할 수 있음

The SIFT algorithm has 4 basic steps.

1. DoG를 이용하여 sclae space의 최대/최소값(extrema) 계산

2. key point candidate중에 잡음을 제거(low constrast point들, Tx treshold사용)

3. local 이미지에서의 key point들의 방향을 정함

4. key point들의 방향과 magnitude를 가지고 생성한 descriptor(vector)로

각 key point에서의 local image descriptor를 계산함

** DoG(Difference of Gaussian) : Gaussian 처리해준 이미지의 차이. 즉, 시그마들의 차이임

참고로, 1단계에서 local extrema를 계산하면 그것이 key point들이 되는 것임

Overview of SIFT

local extrema detection = keypoint를 뽑는 것

SIFT의 공식 5단계

1. Scale space construction (scale space로 부터 DoG space를 구할 수 있음)

2. Difference of Gaussian 계산

3. Local extrema detection

4. Descriptor construction

5. Image matching

1. Scale space construction

scale space를 적용할때 각 다른 σ 값(보통 root(2)의 배수)

→ 이미지가 점점 더 blur해짐

그 다음에는(octave2) 실제로 이미지의 사이즈를 줄여서

다른 시그마값을 적용해줌

Scale space 구성

• Octave 1 uses scale σ

• Octave 2 uses scale 2σ

2. Difference of Gaussian 계산

DoG(Laplacian of Gaussian: LoG와 유사) space란?

각 Octave내의 인접한 이미지들을 차이값

DoG값이 크게 나타나는 것이 scale space의 extrema임

여러개의 octave를 쓰는 이유는 더 많은 scale의 변화량을 고려하기 위해서임

참고로 아래 그림에서처럼 gaussian filter를 씌우면 점점 blur해지는데,

이는 scale을 작게했다가 다시 크게하면 이미지가 흐려지는 것과 동일한 벙법으로

gauusian filter를 씌우는 것도 blur해지는 것이므로 scaling하는 작업이라 할 수 있음

이미지의 scaling작업은 gaussian filter를 씌우는 것과 같다.

3. Local Extrema Detection

- DoG space에서 각각의 pixel값이 주변에 26개의 인접한 pixel값과 비교했을 때,

항상 큰 값이면 maxima, 항상 작으면 minima가 됨

그 다음, low contrast에 대해 filter out 해줌(refine 해줌)

추가로 harris corner detector를 통해 keypoint가 edge에 align되어있을 때도, 제외해줌

4. Descriptor construction

- 위에서 구해진 keypoint(extrema)들을 통해 descriptor를 만듦

descriptor(vector) length를 128로 보통 사용함

- 위의 descriptor에 window(4*4)를 적용함

- 각각의 window에서의 direction으로 길이가 8인 histogram을 생성함 (8*4*4)

(45도 단위로 구획을 나누기 때문에 길이가 8임)

pixel개수가 늘어나도 8차원으로 고정임

5. Image matching

(이미지 유사도를 판단할때 keypoint를 활용해서 할 수 있음)

key point들간의 유사도를 판단함

L2 distance방법을 사용

유사도 판단시, threshold를 사용

→ 유사한 point가 여러개 있다면 그것은 unique한 keypoint가 되지 못하므로

→ uniqueness, distinctiveness, similarity를 판단할 수 있음

threshold보다 작으려면 d2는 적어도 d1의 2배의 distance가 되어야 함.

scale = 물체의 크기 = gaussian sigma의 값

Larger scale – larger object - smaller base image

(octave커질 수록 scale이 커지는 것임)

→ octave가 커질 수록(scale이 커질수록) 원본이미지에서 더 큰 window를 가짐

왜냐햐면 octave가 클수록 scale이 크기 때문에

SIFT의 공식 7단계

1. Construct multi-scale pyramid

2. Find local extrema (candidate keypoints) in multi-scale pyramid

3. Eliminate unstable keypoints (such as points on edges)

중요! (Edge에 있는 point를 왜 제거해야하는가? edge에 있는 point들은 재현성이 떨어짐)

Edge에 있는 point들은 keypoint의 특성(두 방향으로 gradient가 커야함)을 가지지 않음

filter out 해주는 것임 (Tx의 threshold 사용)

4. Interpolate keypoints (local quadratic function)

(keypoint가 pixel의 중심점이 아닐경우 interpolation을 사용하여 중심점으로 이동 시켜줌)

(보간된 keypoint를 가지고 뒤에 특징벡터로 저장하여 사용할 것임)

5. Decide window size for each keypoint

to calculate major orientation (for rotation invariance) and descriptor construction

(scale이 크면 더 큰 window를 사용)

6. Construct descriptor at the scale where the keypoint is detected

7. Calculate similarity between images(L2 distance 사용)

(Harris detector와 가장큰 차이는 scale정보를 사용한다는 것임)

SIFT에서 사용하는 parameter들

DoG(x, y) >= Tx 에서 사용하는 Tx값:

Tx가 커질 수록 accuracy가 떨어짐

(왜냐하면, Tx가 threshold기 때문에, 클수록 검출되는 keypoint 개수가 작아져서 성능이 떨어짐)

#feat: 261.23의 의미:

→ 이미지당 261개의 feature가 뽑혔다는 뜻

비슷한 성능이라면 keypoint가 많은 것보다는 적은것이 더 좋음

→ keypoint가 늘어나면 연산량 증가, 속도가 느려짐(L2 distance 계산시)

다양한 kepoint detector중에서 SIFT-DoG가 성능이 보통 제일 좋음

- Harris Laplace, Hessian Laplace, LoG(Laplacian of Gaussian)등에 비해서

Stabilities of Keypoints

: 이미지가 변형 이동되었더라고 해도

2개의 이미지에서 유사한 keypoint를 뽑아야 성능이 좋다고 할 수 있음

(위의 SIFT detector와 같이)

SURF(Speeded up robust features) detector

https://vision0814.tistory.com/172

SURF(Speed-Up Robust Features)

SURF는 2008년 Herbert Bay가 제안한 방법입니다. [Bay 08] Bay, Herbert, et al. "Speeded-up robust features (SURF)." Computer vision and image understanding 110.3 (2008): 346-359. 2004년 SIFT의 등장..

vision0814.tistory.com

위의 블로그 설명이 매우 잘 나와있으니 꼭 참고해보기

SURF detector의 특징

- Harr wavelet 필터사용

아래 5, 6번째 단계에서 key point orientation과 interest point description을 계산하기 위해 사용

기존의 SIFT는 왼쪽과 같은 gradient를 사용했다면

SURF는 오른쪽과같이 0,1로 된 필터를 사용하여 gradient(orientation)를 계산함(두개 모두 거의 유사한 방법임)

- box filter 사용(해시안 행렬을 더 쉽고 빠르게 계산)

- BLOB detector 사용

keypoint를 찾기 위해 해시안 행렬에 기반한 BLOB detector 사용

- integral image미리 계산해서 사용함

SIFT와 SURF의 비교

SURF에서 maxima는 "+", minima는 "-" key point라고 설정해줌

나중에 plus끼리 매칭하고, minus 끼리 매칭함

SIFT에서는 maxima, minima를 동일한 keypoint로 취급함

SURF 알고리즘 설명

1. integral image

→ 왼쪽 위 끝에서부터 현재 좌표까지의 pixel의 합

아래와 같이 integral 이미지를 만들어놓음으로써 연산을 조금더 빨리 할 수 있음

(d-b-c+a)로 색칠된 영역을 구할 수 있음

→ Hessian Matrix Determinant를 구할 때 사용함

2. Hessian Matrix를 통해 Determinant를 계산하면

Determinant값이 큰 것이 keypoint라고 할 수 있음

4. Application of the Non-maximal suppression and threshold

26개의 nearest neighbours와 비교하여 가장큰 determinant 값을 찾는 것

(keypoint candidate를 얻게 됨)

5. Computation of the interest point orientation(=max orientation)

6. Computation of the interest point descriptor

descriptor를 생성하기 위해 관심점 주위 20scale 크기의 window를 적용하고

max orientation방향으로 회전시킨뒤 계산함

SURF의 2가지 특징(장점)

– Scale and rotation(in plane rotation) invariant

– Insensitive to different viewing angels

** in-plane rotation: Z-축 방향으로 회전시키는 것(view point 변경)

** out-of-plane rotation: X,Y-축 방향으로 회전시키는 것

LIFT detector(Learned Invariant Feature Transform)

view point chage가 많은 것은 여전히 SIFT가 성능이 좋음

Genuine pair에서는 L2 distance가 작게,

imposter pair에서는 L2 distance가 크게 되도록 network를 학습시킴

** LIFT는 scale변화가 없을 때 성능이 좋게나옴

왜냐하면 LIFT pipeline에서는 scale관련된 내용이 process 상에 없음

** SURF에 대해 추가 참고할만한 블로그

https://blog.naver.com/official_ksk/222528536506

Computer Vision | 디스크립터 Descriptor [이론 6-2 ]

디스크립터 * ORB - Oriented BRIEF(Binary Robust Independent Elelmentary Features) ...

blog.naver.com

728x90

'IT > 영상처리분석' 카테고리의 다른 글

Face Detection (0)	2022.05.23
여러 종류의 Feature descriptor (0)	2022.05.16
Line & Ellipse detection (0)	2022.04.17
Edge Detection (0)	2022.03.31
컬러 이미지 처리 (1)	2022.03.21

현재글keypoint detection - Harris, SIFT, SURF, LIFT

sarah0518