[Object Detection] 2 stage Detectors

CS 공부/AI 2023. 5. 3. 02:30

R-CNN

Selective Search: 이미지를 무수히 많은 작은 영역으로 나눈 다음, 점차 통합해 나가는 형식

Pipeline

입력 이미지 1개 받기
Selective Search를 통해 약 2000의 RoI(후보영역)를 추출
RoI의 크기를 조절해 모두 동일한 사이즈로 변형 (CNN의 FC layer의 입력 사이즈가 고정이므로)
RoI를 CNN에 넣어 feature를 추출
- Pretained AlexNet 구조 활용 (FC layer 추가), Finetuning 진행
CNN을 통해 나온 feature를 SVM에 넣어 분류 진행
- input: 2000x4096 features
- output: (C+1) + Confidence scores
CNN을 통해 나온 feature를 regression을 통해 bounding box를 예측

Training

AlexNet finetuning
- IoU>0.5: positive samples
- IoU < 0.5: negative samples
- batch 당 positive sample 32, negative samples 96

Linear SVM
- Dataset 구성
  - Ground truth: positive samples
  - IoU <0.3: negative samples
- Hard negative mining: False positive
  - 배경으로 식별하기 어려운 샘플들을 강제로 다음 배치의 negative sample로 mining

Bbox regressor
- Dataset 구성
  - IoU >0.6: positive samples
- Loss function: MSE Loss

단점

2000개의 Region을 다 통과해야함 (연산량 많고 느림)
강제 Warping, 성능 하락 가능성
CNN, SVM, Bbox regressor을 따로 학습
end-to-end 방식이 아님

SPPNet

단점

CNN, SVM, Bbox regressor을 따로 학습
end-to-end 방식이 아님

Fast R-CNN

Pipeline

이미지를 CNN에 넣어 feature 추출 (VGG16 사용)
RoI Projection을 통해 feature map 상에서 RoI 계산
- selective search를 통해 2000개의 RoI projection 시킴
- RoI Projection: size 축소
RoI Pooling을 통해 일정한 크기의 feature 추출
- 고정된 벡터를 얻기 위한 과정
- SPP 사용
Fully connected layer 이후, softmax classifier와 bbox regressor 통과
- 클래스 개수 C+1(배경)개

Training

multi task loss 사용 (classification loss + bbox regression)

Loss function
- classification: cross entropy
- BB regressor: Smooth L1

Dataset 구성
- IoU > 0.5: positive samples
- 0.1 < IoU < 0.5: negative samples
- Positive samples 25%, negative samples 75%

Training
- Hiearchical sampling
  - R-CNN의 경우 이미지에 존재하는 RoI를 전부 저장해서 사용
  - 한 배치에 서로 다른 이미지의 RoI가 포함됨
  - Faster R-CNN의 경우 한 배치에 한 이미지의 RoI만 포함
  - 한 배치 안에서 연산과 메모리를 공유할 수 있음

단점

end-to-end 방식이 아님

Faster R-CNN

Pipeline

이미지를 CNN에 넣어 feature maps 추출
RPN을 통해 RoI 계산
- 기존의 selective search 대신 Anchor box 개념(아주 중요) 사용
- 원본 이미지에서 anchor box를 생성하면 수많은 region proposals가 만들어진다.
- RPN: 원본 이미지에서 region proposals를 추출하는 네트워크
  - VGG16으로부터 feature map을 입력받아 anchor에 대한 class score, bbox regressor를 반환하는 역할
- RPN 과정
  1. CNN에서 나온 feature map을 input으로 받음
  2. 3x3 conv를 수행하여 intermediate layer 생성
  3. 1x1 conv 수행하여 binary classification 수행
    - 채널 2(object or not)*9(num of anchors) 개
  4. 1x1 conv 수행하여 bbox regression 수행
    - 채널 4(bounding box)*9(num of anchors) 개
RPN 결과 예측된 n개의 region에 대해 겹치는 영역 NMS로 줄이기
- NMS: 유사한 RPN Proposals 제거하기 위해 사용
  - class score를 기준으로 proposals 분류
  - IoU가 0.7 이상인 proposals 영역들은 중복된 영역으로 판단한 뒤 제거 (반복)

Training

RPN

RPN 단계에서 classification과 regressor 학습을 위해 앵커 박스를 positive / negative samples로 구분
Dataset 구성
- IoU > 0.7 or Ground truth: positive samples
- IoU < 0.3: negative samples
- 나머지: 학습데이터로 사용 x

Fast R-CNN

RPN 이후 Fast RCNN 학습을 위해 positive / negative samples로 구분
Dataset 구성
- IoU > 0.5: positive samples ->32개
- IoU < 0.5: negative samples ->96개
- 128개의 samples로 미니 배치 구성

RPN & Fast R-CNN

4 steps alternative training 활용
1. imagenet pretrained backbone + RPN 학습
2. imagenet pretrained backbone + RPN 학습 + Fast RCNN 학습
3. 2번 finetuned backbone freeze + RPN 학습
4. 2번 finetuned backbone freeze + RPN 학습 + Fast RCNN 학습
과정이 복잡해서 최근에는 Approximate Joint Training 활용 (loss 한번에 묶어서 backward)

2 stage Detectors 논문 정리 블로그

약초의 숲으로 놀러오세요

herbwood.tistory.com

'CS 공부 > AI' 카테고리의 다른 글

[Object Detection] Neck (0)	2023.05.03
[Object Detection] MMDetection과 Detectron2 (0)	2023.05.03
[Object Detection] Overview (0)	2023.05.02
Model Serving (0)	2023.04.27
MLOps 개론 (0)	2023.04.27

ABOUT ME

Carpe Diem Carpe Diem

R-CNN

Pipeline

Training

단점

SPPNet

단점

Fast R-CNN

Pipeline

Training

단점

Faster R-CNN

Pipeline

Training

2 stage Detectors 논문 정리 블로그

'CS 공부 > AI' 카테고리의 다른 글

티스토리툴바

ABOUT ME

R-CNN

Pipeline

Training

단점

SPPNet

단점

Fast R-CNN

Pipeline

Training

단점

Faster R-CNN

Pipeline

Training

2 stage Detectors 논문 정리 블로그

'CS 공부 > AI' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바