Sessi 14 — Etika AI untuk NLP
Tujuan: mengenali & mengukur bias, menerapkan metrik keadilan (fairness), menjelaskan prediksi (explainability), serta mengelola privasi & kebocoran data dalam proyek NLP.
Learning Outcomes: (1) Menghitung metrik fairness per‑kelompok; (2) Memeriksa calibration per‑kelompok; (3) Menjelaskan model dengan koefisien/SHAP; (4) Mengidentifikasi potensi pelanggaran privasi & mengurangi kebocoran data.
1) Konsep Inti
- Bias data: ketidakseimbangan representasi/label yang menghasilkan perilaku tidak adil.
- Fairness metrics: Demographic Parity (selection rate), Equal Opportunity (TPR parity), Equalized Odds (TPR & FPR parity).
- Calibration: skor probabilitas konsisten antar‑kelompok (uji Brier/ELCE sederhana).
- Explainability: koefisien fitur (model linier), SHAP/LIME untuk penjelasan lokal.
- Privacy: data minimization, k‑anonymity, deteksi kebocoran (data leakage) & train/test contamination.
2) Praktik Google Colab — Audit Fairness & Explainability
Kita gunakan model dari Sessi 7/10 (LogReg pada TF–IDF). Tambahkan sensitive attribute sederhana untuk simulasi, atau muat dari file meta_labels.csv jika tersedia.
A. Setup & Data
!pip -q install pandas numpy scikit-learn matplotlib shap
import numpy as np, pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, brier_score_loss, roc_auc_score
import matplotlib.pyplot as plt
# 1) Muat data berlabel
try:
df = pd.read_csv('logreg_dataset_sessi7.csv') # kolom: text, y
except:
base = pd.read_csv('corpus_sessi3_variants.csv')['v2_stop_stemID'].dropna().astype(str).tolist()
POS = {"bagus","mantap","menyenangkan","cepat","baik","excellent","impressive","friendly","tajam","bersih"}
NEG = {"buruk","lambat","telat","downtime","lemah","weak","late","dented","smelled","failed","delay"}
def weak_label(t):
w=set(t.split()); p=len(w & POS); n=len(w & NEG)
if p>n: return 1
if n>p: return 0
return None
df = pd.DataFrame({'text':base, 'y':[weak_label(t) for t in base]}).dropna()
# 2) Tambahkan sensitive attribute (simulasi / muat dari meta)
try:
meta = pd.read_csv('meta_labels.csv') # kolom: id(optional), group
df = df.reset_index(drop=True)
df['group'] = meta['group'][:len(df)].fillna('A')
except:
# Heuristik: kelompok A jika mengandung kata2 tertentu, else B (sekadar simulasi)
KEWA = {"ibu","mbak","sista"}
KEWB = {"bapak","mas","bro"}
def assign_group(t):
w=set(t.split())
if len(w & KEWA)>0: return 'A' # contoh: female-coded terms
if len(w & KEWB)>0: return 'B' # contoh: male-coded terms
return np.random.choice(['A','B'])
df['group'] = df['text'].apply(assign_group)
print(df['group'].value_counts())
# 3) Split & Vectorize
X_train, X_test, y_train, y_test, g_train, g_test = train_test_split(
df['text'], df['y'].astype(int), df['group'], test_size=0.3, stratify=df[['y','group']], random_state=42)
vec = TfidfVectorizer(ngram_range=(1,2), min_df=2, max_df=0.95, sublinear_tf=True, norm='l2')
Xtr = vec.fit_transform(X_train)
Xte = vec.transform(X_test)
clf = LogisticRegression(max_iter=300, class_weight='balanced')
clf.fit(Xtr, y_train)
proba = clf.predict_proba(Xte)[:,1]
yhat = (proba>=0.5).astype(int)
print('Overall Test Report')
print(classification_report(y_test, yhat, digits=3))
B. Metrik Fairness per‑Kelompok
import numpy as np
from sklearn.metrics import confusion_matrix
def group_metrics(y_true, y_prob, y_pred, group):
rows=[]
for g in sorted(np.unique(group)):
m = group==g
yt, yp, yb = y_true[m], y_pred[m], y_prob[m]
tn, fp, fn, tp = confusion_matrix(yt, yb).ravel()
sel_rate = (yb==1).mean() # Demographic parity proxy
tpr = tp/(tp+fn+1e-12) # Equal opportunity
fpr = fp/(fp+tn+1e-12)
brier = brier_score_loss(yt, yb) # calibration proxy @0/1
auc = roc_auc_score(yt, y_prob) if len(np.unique(yt))>1 else np.nan
rows.append([g, sel_rate, tpr, fpr, brier, auc, tp, fp, fn, tn, len(yt)])
import pandas as pd
return pd.DataFrame(rows, columns=['group','selection_rate','TPR','FPR','Brier','ROC_AUC','TP','FP','FN','TN','N'])
per_group = group_metrics(y_test.values, proba, yhat, g_test.values)
print(per_group)
# Disparitas
dp_diff = per_group['selection_rate'].max() - per_group['selection_rate'].min()
eo_diff = per_group['TPR'].max() - per_group['TPR'].min()
eo2_diff = (per_group['FPR'].max() - per_group['FPR'].min())
print(f'ΔDemographicParity={dp_diff:.3f} ΔTPR={eo_diff:.3f} ΔFPR={eo2_diff:.3f}')
C. Tuning Ambang per‑Kelompok (Equal Opportunity)
# Cari threshold per‑kelompok untuk menyamakan TPR target
import numpy as np
def find_threshold_for_tpr(y_true, y_prob, target_tpr=0.8):
thr_list = np.linspace(0.1,0.9,41)
best=0.5; best_gap=1
for t in thr_list:
yb = (y_prob>=t).astype(int)
tn, fp, fn, tp = confusion_matrix(y_true, yb).ravel()
tpr = tp/(tp+fn+1e-12)
if abs(tpr-target_tpr)=thr_A).astype(int), (proba>=thr_B).astype(int))
per_group_adj = group_metrics(y_test.values, proba, yhat_adj, g_test.values)
print('\nSesudah threshold per‑kelompok:')
print(per_group_adj)
D. Explainability: Koefisien & SHAP
# Koefisien fitur (global)
feat = vec.get_feature_names_out()
coef = clf.coef_.ravel()
ix_pos = coef.argsort()[::-1][:20]
ix_neg = coef.argsort()[:20]
print('Top + fitur:', [(feat[i], round(float(coef[i]),3)) for i in ix_pos])
print('Top - fitur:', [(feat[i], round(float(coef[i]),3)) for i in ix_neg])
# SHAP (penjelasan lokal untuk 20 contoh)
import shap
explainer = shap.LinearExplainer(clf, Xtr, feature_dependence="independent")
shap_values = explainer.shap_values(Xte[:20])
# Ringkas nilai absolut rata2
import numpy as np
abs_mean = np.abs(shap_values).mean(axis=0)
ix = abs_mean.argsort()[::-1][:20]
print('Top SHAP fitur:', [(feat[i], round(float(abs_mean[i]),4)) for i in ix])
E. Cek Kebocoran & Privacy Quick‑Wins
# 1) Cek duplikasi train↔test (kebocoran sederhana)
tr_set = set(map(str, X_train))
leak = sum(1 for t in X_test if str(t) in tr_set)
print('Duplikasi train→test (indikasi kebocoran):', leak)
# 2) Masking PII sederhana (email/telepon) sebelum vektorisasi
import re
PII_EMAIL = re.compile(r"[\w\.-]+@[\w\.-]+")
PII_PHONE = re.compile(r"\b\+?\d[\d\-\s]{7,}\b")
def mask_pii(text):
text = PII_EMAIL.sub(' ', text)
text = PII_PHONE.sub(' ', text)
return text
print('PII masking contoh:', mask_pii('hubungi saya di agus@kampus.id atau +62-812-1234-5678'))
F. Simpan Artefak Audit
import joblib
joblib.dump({'per_group_before':per_group, 'per_group_after':per_group_adj}, 'fairness_report_sessi14.joblib')
joblib.dump({'vec':vec,'clf':clf}, 'tfidf_logreg_sessi14.joblib')
print('Tersimpan: fairness_report_sessi14.joblib, tfidf_logreg_sessi14.joblib')
3) Studi Kasus & Analisis
| Kasus | Masalah Etika | Mitigasi | Catatan |
|---|---|---|---|
| Moderasi komentar | False positive lebih tinggi pada kelompok tertentu | Threshold per‑kelompok + audit berkala | Pantau ΔTPR/ΔFPR |
| Screening tiket | Demographic parity bias (selection rate timpang) | Reweighting/Resampling + atur ambang | Evaluasi dampak layanan |
| Analisis sentimen | Fitur spurious (kata slang daerah) | Kurasi stopword domain, SHAP audit | Re‑training setelah perbaikan |
4) Checklist Praktikum Etika
- ✔ Laporan metrik per‑kelompok (selection rate, TPR, FPR, Brier).
- ✔ Grafik ROC/PR per‑kelompok (opsional).
- ✔ Penjelasan global (koefisien) & lokal (SHAP) + audit contoh.
- ✔ Langkah privasi (masking PII, data minimization), serta cek duplikasi train↔test.
- ✔ Rekomendasi mitigasi & rencana monitoring.
5) Tugas Mini (Dinilai)
- Bangun laporan fairness (tabel per‑kelompok + Δparity) untuk model LogReg TF–IDF Anda.
- Lakukan tuning ambang per‑kelompok untuk menyeimbangkan TPR (target 0.8) dan laporkan dampaknya terhadap FPR & AP.
- Gunakan SHAP pada 20 contoh untuk menemukan 5 fitur paling berpengaruh; audit 5 kesalahan terbesar dan usulkan perbaikan data.