Machine Learning&Deep Learning/Kaggle 필사

Speech representation and data exploration Notebook Input Output Logs

1. Visualization of recordings - input features

- feature MFCC 활용

** MFCC란?

오디오 신호에서 추출할 수 있는 feature, 소리의 고유한 특징을 나타낸다. 주로 음성 인식, 화자 식별, 장르 분류 등 오디오 도메인의 문제를 해결한다.

1. 오디오 신호를 프레임별(보통 20ms - 40ms)로 나누어 [시간 영역 도메인] -> FFT를 적용해 [주파수 영역 도메인] -> Spectrum을 구한다. -> 주파수 대역별 세기를 알 수 있다. (신호에서 어떤 주파수가 약하고 강한지)

2. [필터링] Spectrum에 Mel Filter Bank를 적용해 Mel Spectrum을 구한다.

3. [정보 추출] Mel Spectrum에 Cepstral 분석을 적용해 MFCC를 구한다. MFCC는 아래 Formants (지배적인 주파수 영역, peak)와 Spectrum을 분리하는 과정에서 추출된다.

- 본 필사에서는 librosa python package를 사용하여 Mel power spectrogram과 MFCC를 계산할 수 있다.

librosa_samples, librosa_sample_rate = librosa.load(str(train_audio_path)+filename)

#mel spectrogram
S = librosa.feature.melspectrogram(librosa_samples, sr=librosa_sample_rate, n_mels=128, fmax=8000)

# log scale 전환(dB). 우리는 peak power(max)만을 사용할 것이다.
log_S = librosa.power_to_db(S, ref=np.max)

plt.figure(figsize=(12,4))
librosa.display.specshow(log_S, sr=sample_rate, x_axis='time', y_axis='mel')

plt.title('Mel Power spectrogram')
plt.colorbar(format='%+02.0f dB')
plt.tight_layout()

#S에서 mfcc 추출한다.
mfcc = librosa.feature.mfcc(S=log_S, n_mfcc=13)

delta2_mfcc =  librosa.feature.delta(mfcc, order=2)

plt.figure(figsize=(12,4))
librosa.display.specshow(delta2_mfcc)
plt.ylabel('MFCC coeffs')
plt.xlabel('Time')
plt.title('MFCC')
plt.colorbar()
plt.tight_layout()

1.4. Silence Removal

- VAD 활용

1.6. Features extraction steps

변수 추출 알고리즘은 아래와 같이 진행한다.

Resampling
VAD (Voice Activity Detection)
신호 길이가 같도록 0으로 padding
Log spectrogram( or MFCC, or PLP)
평균 및 표준 피쳐 정규화
임시 정보를 얻기 위해 지정된 수의 프레임 쌓기

2.5. Frequency components across the words

- 단어별 주파수 특징을 도출해낸다.

'Machine Learning&Deep Learning > Kaggle 필사' 카테고리의 다른 글

Binary classification : Tabular data 1st Level (0)	2023.03.18

Contents

당신이 좋아할만한 콘텐츠

Binary classification : Tabular data 1st Level 2023.03.18

새소식

인기 검색어