๐ Pertemuan 9 - Dasar Statistik untuk Data Sains
๐ฏ Tujuan Pembelajaran
- Memahami konsep statistik deskriptif dan inferensial
- Menguasai ukuran pemusatan dan penyebaran data
- Melakukan uji hipotesis sederhana
- Memahami konsep distribusi data
- Mengaplikasikan korelasi dan regresi dasar
- Interpretasi hasil statistik untuk pengambilan keputusan
๐งฉ Mengapa Statistik Penting dalam Data Sains?
Statistik adalah fondasi dari data sains yang membantu kita:
- ๐ Merangkum data - Mengubah data besar menjadi informasi yang mudah dipahami
- ๐ฏ Membuat inferensi - Menarik kesimpulan dari sampel ke populasi
- ๐ Menemukan pola - Mengidentifikasi hubungan antar variabel
- ๐ Membuat prediksi - Memperkirakan nilai masa depan
- โ Validasi hipotesis - Menguji asumsi dengan data
๐ 1. Statistik Deskriptif
A. Ukuran Pemusatan (Central Tendency)
Mean (Rata-rata): xฬ = ฮฃx / n
Median: Nilai tengah data terurut
Modus: Nilai yang paling sering muncul
Median: Nilai tengah data terurut
Modus: Nilai yang paling sering muncul
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
# Generate sample data
np.random.seed(42)
data = {
'Nilai_Ujian': np.random.normal(75, 10, 100),
'Jam_Belajar': np.random.uniform(1, 8, 100),
'Kehadiran': np.random.uniform(70, 100, 100)
}
df = pd.DataFrame(data)
# Ukuran Pemusatan
print("="*50)
print("UKURAN PEMUSATAN (Central Tendency)")
print("="*50)
for col in df.columns:
print(f"\n{col}:")
print(f" Mean : {df[col].mean():.2f}")
print(f" Median : {df[col].median():.2f}")
print(f" Mode : {df[col].mode().values[0]:.2f}")
# Visualisasi
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for i, col in enumerate(df.columns):
axes[i].hist(df[col], bins=20, alpha=0.7, color='skyblue', edgecolor='black')
axes[i].axvline(df[col].mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {df[col].mean():.1f}')
axes[i].axvline(df[col].median(), color='green', linestyle='--', linewidth=2, label=f'Median: {df[col].median():.1f}')
axes[i].set_title(f'Distribution of {col}')
axes[i].set_xlabel(col)
axes[i].set_ylabel('Frequency')
axes[i].legend()
axes[i].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
B. Ukuran Penyebaran (Dispersion)
Range: Max - Min
Variance: ฯยฒ = ฮฃ(x - xฬ)ยฒ / n
Standard Deviation: ฯ = โvariance
IQR: Q3 - Q1
Variance: ฯยฒ = ฮฃ(x - xฬ)ยฒ / n
Standard Deviation: ฯ = โvariance
IQR: Q3 - Q1
# Ukuran Penyebaran
print("\n" + "="*50)
print("UKURAN PENYEBARAN (Dispersion)")
print("="*50)
for col in df.columns:
print(f"\n{col}:")
print(f" Range : {df[col].max() - df[col].min():.2f}")
print(f" Variance : {df[col].var():.2f}")
print(f" Std Dev : {df[col].std():.2f}")
print(f" IQR : {df[col].quantile(0.75) - df[col].quantile(0.25):.2f}")
print(f" CV : {(df[col].std()/df[col].mean())*100:.2f}%")
# Boxplot untuk visualisasi penyebaran
fig, ax = plt.subplots(figsize=(10, 6))
df.boxplot(ax=ax, grid=False)
ax.set_title('Boxplot - Ukuran Penyebaran Data', fontsize=14, fontweight='bold')
ax.set_ylabel('Values')
plt.show()
# Interpretasi
print("\n๐ Interpretasi:")
for col in df.columns:
cv = (df[col].std()/df[col].mean())*100
if cv < 10:
variability = "sangat rendah (homogen)"
elif cv < 20:
variability = "rendah"
elif cv < 30:
variability = "sedang"
else:
variability = "tinggi (heterogen)"
print(f"{col}: Variabilitas {variability} (CV={cv:.1f}%)")
๐ 2. Distribusi Data
A. Normal Distribution
Distribusi normal (Gaussian) adalah distribusi paling penting dalam statistik:
- Berbentuk lonceng (bell curve)
- Simetris terhadap mean
- 68% data dalam ยฑ1ฯ, 95% dalam ยฑ2ฯ, 99.7% dalam ยฑ3ฯ
# Test Normalitas
from scipy.stats import shapiro, normaltest, kstest
print("="*50)
print("UJI NORMALITAS")
print("="*50)
for col in df.columns:
# Shapiro-Wilk Test
stat_shapiro, p_shapiro = shapiro(df[col])
# Kolmogorov-Smirnov Test
stat_ks, p_ks = kstest(df[col], 'norm', args=(df[col].mean(), df[col].std()))
print(f"\n{col}:")
print(f" Shapiro-Wilk: statistic={stat_shapiro:.4f}, p-value={p_shapiro:.4f}")
print(f" K-S Test : statistic={stat_ks:.4f}, p-value={p_ks:.4f}")
if p_shapiro > 0.05:
print(f" โ
Data berdistribusi normal (p > 0.05)")
else:
print(f" โ Data TIDAK berdistribusi normal (p < 0.05)")
# Visualisasi Q-Q Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for i, col in enumerate(df.columns):
stats.probplot(df[col], dist="norm", plot=axes[i])
axes[i].set_title(f'Q-Q Plot: {col}')
axes[i].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
B. Skewness & Kurtosis
# Skewness dan Kurtosis
print("\n" + "="*50)
print("SKEWNESS & KURTOSIS")
print("="*50)
for col in df.columns:
skew = df[col].skew()
kurt = df[col].kurtosis()
print(f"\n{col}:")
print(f" Skewness : {skew:.3f}", end=" ")
if abs(skew) < 0.5:
print("(Simetris)")
elif skew > 0:
print("(Positively skewed - ekor kanan)")
else:
print("(Negatively skewed - ekor kiri)")
print(f" Kurtosis : {kurt:.3f}", end=" ")
if abs(kurt) < 0.5:
print("(Mesokurtic - normal)")
elif kurt > 0:
print("(Leptokurtic - lancip)")
else:
print("(Platykurtic - datar)")
# Visualisasi distribusi dengan KDE
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for i, col in enumerate(df.columns):
axes[i].hist(df[col], bins=30, density=True, alpha=0.7, color='lightblue', edgecolor='black')
df[col].plot.kde(ax=axes[i], color='red', linewidth=2)
axes[i].set_title(f'{col}\nSkew={df[col].skew():.2f}, Kurt={df[col].kurtosis():.2f}')
axes[i].set_xlabel('Value')
axes[i].set_ylabel('Density')
axes[i].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
๐ 3. Korelasi Antar Variabel
Jenis-jenis Korelasi:
| Metode | Kapan Digunakan | Range |
|---|---|---|
| Pearson | Linear, data normal | -1 to +1 |
| Spearman | Monotonic, non-parametrik | -1 to +1 |
| Kendall | Ordinal, small sample | -1 to +1 |
# Analisis Korelasi
print("="*50)
print("ANALISIS KORELASI")
print("="*50)
# Calculate different correlations
pearson_corr = df.corr(method='pearson')
spearman_corr = df.corr(method='spearman')
kendall_corr = df.corr(method='kendall')
# Visualisasi
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Pearson
sns.heatmap(pearson_corr, annot=True, cmap='coolwarm', center=0,
square=True, ax=axes[0], vmin=-1, vmax=1)
axes[0].set_title('Pearson Correlation')
# Spearman
sns.heatmap(spearman_corr, annot=True, cmap='coolwarm', center=0,
square=True, ax=axes[1], vmin=-1, vmax=1)
axes[1].set_title('Spearman Correlation')
# Kendall
sns.heatmap(kendall_corr, annot=True, cmap='coolwarm', center=0,
square=True, ax=axes[2], vmin=-1, vmax=1)
axes[2].set_title('Kendall Correlation')
plt.tight_layout()
plt.show()
# Interpretasi korelasi
def interpret_correlation(r):
abs_r = abs(r)
if abs_r < 0.1:
return "Tidak ada korelasi"
elif abs_r < 0.3:
return "Korelasi sangat lemah"
elif abs_r < 0.5:
return "Korelasi lemah"
elif abs_r < 0.7:
return "Korelasi sedang"
elif abs_r < 0.9:
return "Korelasi kuat"
else:
return "Korelasi sangat kuat"
print("\n๐ Interpretasi Korelasi Pearson:")
for i in range(len(df.columns)):
for j in range(i+1, len(df.columns)):
r = pearson_corr.iloc[i, j]
interpretation = interpret_correlation(r)
print(f"{df.columns[i]} vs {df.columns[j]}: r={r:.3f} ({interpretation})")
๐ 4. Regresi Linear Sederhana
Model Regresi Linear: y = ฮฒโ + ฮฒโx + ฮต
R-squared: Proporsi variasi y yang dijelaskan oleh x
R-squared: Proporsi variasi y yang dijelaskan oleh x
# Regresi Linear Sederhana
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
print("="*50)
print("REGRESI LINEAR SEDERHANA")
print("="*50)
# Prepare data
X = df[['Jam_Belajar']].values
y = df['Nilai_Ujian'].values
# Fit model
model = LinearRegression()
model.fit(X, y)
# Predictions
y_pred = model.predict(X)
# Model evaluation
r2 = r2_score(y, y_pred)
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
print(f"\n๐ Model Equation: Nilai = {model.intercept_:.2f} + {model.coef_[0]:.2f} * Jam_Belajar")
print(f"\nModel Performance:")
print(f" R-squared : {r2:.4f} ({r2*100:.2f}% variance explained)")
print(f" RMSE : {rmse:.2f}")
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Scatter plot with regression line
axes[0].scatter(X, y, alpha=0.6, color='blue', label='Actual')
axes[0].plot(X, y_pred, color='red', linewidth=2, label='Regression Line')
axes[0].set_xlabel('Jam Belajar')
axes[0].set_ylabel('Nilai Ujian')
axes[0].set_title(f'Linear Regression\nRยฒ = {r2:.3f}')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Add confidence interval
from scipy import stats as st
confidence = 0.95
predict_std = np.sqrt(mse)
margin = confidence * predict_std
axes[0].fill_between(X.ravel(), y_pred - margin, y_pred + margin,
color='red', alpha=0.2, label='95% CI')
# Residual plot
residuals = y - y_pred
axes[1].scatter(y_pred, residuals, alpha=0.6)
axes[1].axhline(y=0, color='red', linestyle='--')
axes[1].set_xlabel('Predicted Values')
axes[1].set_ylabel('Residuals')
axes[1].set_title('Residual Plot')
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Check assumptions
print("\n๐ Checking Regression Assumptions:")
# 1. Linearity - checked visually
# 2. Normality of residuals
_, p_value = shapiro(residuals)
if p_value > 0.05:
print("โ
Residuals are normally distributed (p={:.3f})".format(p_value))
else:
print("โ ๏ธ Residuals may not be normally distributed (p={:.3f})".format(p_value))
# 3. Homoscedasticity (constant variance)
# Can be checked visually from residual plot
๐งฎ 5. Hypothesis Testing
Langkah-langkah Uji Hipotesis:
- State hypotheses: Hโ (null) vs Hโ (alternative)
- Set significance level: ฮฑ (usually 0.05)
- Calculate test statistic: t, z, ฯยฒ, F
- Find p-value: probability under Hโ
- Make decision: reject Hโ if p < ฮฑ
# T-Test Examples
print("="*50)
print("HYPOTHESIS TESTING")
print("="*50)
# Generate two groups for comparison
np.random.seed(42)
group_A = np.random.normal(75, 10, 50) # Method A
group_B = np.random.normal(78, 12, 50) # Method B
# 1. One-sample t-test
# H0: Mean nilai = 75
print("\n1. One-Sample T-Test")
print("Hโ: ฮผ = 75 (rata-rata nilai = 75)")
t_stat, p_val = stats.ttest_1samp(df['Nilai_Ujian'], 75)
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_val:.4f}")
if p_val < 0.05:
print("โ Reject Hโ: Mean significantly different from 75")
else:
print("โ
Fail to reject Hโ: Mean not significantly different from 75")
# 2. Two-sample t-test
print("\n2. Two-Sample T-Test")
print("Hโ: ฮผโ = ฮผโ (no difference between groups)")
t_stat, p_val = stats.ttest_ind(group_A, group_B)
print(f"Group A mean: {np.mean(group_A):.2f}")
print(f"Group B mean: {np.mean(group_B):.2f}")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_val:.4f}")
if p_val < 0.05:
print("โ Reject Hโ: Significant difference between groups")
else:
print("โ
Fail to reject Hโ: No significant difference")
# 3. Paired t-test (before-after)
before = np.random.normal(70, 10, 30)
after = before + np.random.normal(5, 3, 30) # improvement
print("\n3. Paired T-Test")
print("Hโ: ฮผ_diff = 0 (no improvement)")
t_stat, p_val = stats.ttest_rel(before, after)
print(f"Mean before: {np.mean(before):.2f}")
print(f"Mean after: {np.mean(after):.2f}")
print(f"Mean difference: {np.mean(after - before):.2f}")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_val:.4f}")
if p_val < 0.05:
print("โ Reject Hโ: Significant improvement")
else:
print("โ
Fail to reject Hโ: No significant improvement")
# Visualize t-tests
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# One-sample
axes[0].hist(df['Nilai_Ujian'], bins=20, alpha=0.7, color='skyblue', edgecolor='black')
axes[0].axvline(75, color='red', linestyle='--', linewidth=2, label='Hโ: ฮผ=75')
axes[0].axvline(df['Nilai_Ujian'].mean(), color='green', linestyle='--', linewidth=2,
label=f'Sample mean={df["Nilai_Ujian"].mean():.1f}')
axes[0].set_title('One-Sample T-Test')
axes[0].legend()
# Two-sample
axes[1].boxplot([group_A, group_B], labels=['Group A', 'Group B'])
axes[1].set_title('Two-Sample T-Test')
axes[1].set_ylabel('Values')
# Paired
axes[2].scatter(before, after, alpha=0.6)
axes[2].plot([60, 90], [60, 90], 'r--', label='No change line')
axes[2].set_xlabel('Before')
axes[2].set_ylabel('After')
axes[2].set_title('Paired T-Test')
axes[2].legend()
plt.tight_layout()
plt.show()
๐ 6. Confidence Intervals
# Confidence Intervals
print("="*50)
print("CONFIDENCE INTERVALS")
print("="*50)
def calculate_ci(data, confidence=0.95):
n = len(data)
mean = np.mean(data)
std_err = stats.sem(data)
margin = std_err * stats.t.ppf((1 + confidence) / 2, n - 1)
return mean - margin, mean + margin
# Calculate CIs for different confidence levels
confidence_levels = [0.90, 0.95, 0.99]
for col in df.columns:
print(f"\n{col}:")
print(f"Sample Mean: {df[col].mean():.2f}")
for conf in confidence_levels:
ci_low, ci_high = calculate_ci(df[col], conf)
print(f" {conf*100:.0f}% CI: [{ci_low:.2f}, {ci_high:.2f}]")
# Visualize confidence intervals
fig, ax = plt.subplots(figsize=(10, 6))
means = []
errors = []
labels = []
for col in df.columns:
mean = df[col].mean()
ci_low, ci_high = calculate_ci(df[col], 0.95)
means.append(mean)
errors.append(mean - ci_low)
labels.append(col)
x_pos = np.arange(len(labels))
ax.errorbar(x_pos, means, yerr=errors, fmt='o', capsize=5, capthick=2,
markersize=8, color='blue', ecolor='red')
ax.set_xticks(x_pos)
ax.set_xticklabels(labels)
ax.set_ylabel('Values')
ax.set_title('95% Confidence Intervals')
ax.grid(True, alpha=0.3)
# Add interpretation
for i, (mean, error) in enumerate(zip(means, errors)):
ax.text(i, mean + error + 1, f'{mean:.1f}ยฑ{error:.1f}',
ha='center', fontsize=9)
plt.tight_layout()
plt.show()
๐ฏ Studi Kasus: Analisis Faktor Keberhasilan Mahasiswa
๐ Deskripsi Kasus
Analisis faktor-faktor yang mempengaruhi keberhasilan akademik mahasiswa menggunakan data nilai ujian, jam belajar, kehadiran, dan partisipasi.
# STUDI KASUS LENGKAP
print("="*60)
print("STUDI KASUS: ANALISIS KEBERHASILAN MAHASISWA")
print("="*60)
# Generate comprehensive dataset
np.random.seed(42)
n_students = 200
# Create realistic student data
data = {
'Jam_Belajar': np.random.gamma(2, 2, n_students),
'Kehadiran': np.random.beta(8, 2, n_students) * 100,
'Tugas_Selesai': np.random.beta(7, 3, n_students) * 100,
'Partisipasi': np.random.choice(['Rendah', 'Sedang', 'Tinggi'], n_students, p=[0.3, 0.5, 0.2])
}
# Create nilai with realistic relationships
noise = np.random.normal(0, 5, n_students)
data['Nilai_Akhir'] = (
30 +
data['Jam_Belajar'] * 3 +
data['Kehadiran'] * 0.3 +
data['Tugas_Selesai'] * 0.2 +
noise
)
data['Nilai_Akhir'] = np.clip(data['Nilai_Akhir'], 0, 100)
df_case = pd.DataFrame(data)
# Add grade categories
def assign_grade(nilai):
if nilai >= 85:
return 'A'
elif nilai >= 75:
return 'B'
elif nilai >= 65:
return 'C'
elif nilai >= 55:
return 'D'
else:
return 'E'
df_case['Grade'] = df_case['Nilai_Akhir'].apply(assign_grade)
print("\n๐ Dataset Overview:")
print(df_case.head())
print(f"\nDataset shape: {df_case.shape}")
# 1. Descriptive Statistics
print("\n" + "="*40)
print("1. DESCRIPTIVE STATISTICS")
print("="*40)
print(df_case.describe())
# 2. Distribution Analysis
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
fig.suptitle('Student Performance Analysis', fontsize=16, fontweight='bold')
# Histograms
for i, col in enumerate(['Jam_Belajar', 'Kehadiran', 'Tugas_Selesai']):
axes[0, i].hist(df_case[col], bins=20, color='skyblue', edgecolor='black', alpha=0.7)
axes[0, i].set_title(f'Distribution of {col}')
axes[0, i].set_xlabel(col)
axes[0, i].set_ylabel('Frequency')
axes[0, i].axvline(df_case[col].mean(), color='red', linestyle='--',
label=f'Mean: {df_case[col].mean():.1f}')
axes[0, i].legend()
# Nilai distribution
axes[1, 0].hist(df_case['Nilai_Akhir'], bins=20, color='lightgreen', edgecolor='black', alpha=0.7)
axes[1, 0].set_title('Distribution of Final Grades')
axes[1, 0].set_xlabel('Nilai Akhir')
axes[1, 0].set_ylabel('Frequency')
# Grade distribution
grade_counts = df_case['Grade'].value_counts().sort_index()
axes[1, 1].bar(grade_counts.index, grade_counts.values, color='coral')
axes[1, 1].set_title('Grade Distribution')
axes[1, 1].set_xlabel('Grade')
axes[1, 1].set_ylabel('Count')
# Participation vs Nilai
participation_means = df_case.groupby('Partisipasi')['Nilai_Akhir'].mean()
axes[1, 2].bar(participation_means.index, participation_means.values, color='lightblue')
axes[1, 2].set_title('Average Score by Participation')
axes[1, 2].set_xlabel('Participation Level')
axes[1, 2].set_ylabel('Average Score')
plt.tight_layout()
plt.show()
# 3. Correlation Analysis
print("\n" + "="*40)
print("2. CORRELATION ANALYSIS")
print("="*40)
# Create dummy variables for categorical
df_numeric = pd.get_dummies(df_case, columns=['Partisipasi', 'Grade'])
correlation_matrix = df_numeric.corr()
# Focus on correlations with Nilai_Akhir
nilai_corr = correlation_matrix['Nilai_Akhir'].sort_values(ascending=False)
print("\nCorrelation with Nilai_Akhir:")
print(nilai_corr.head(10))
# 4. Multiple Regression
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df_case['Partisipasi_encoded'] = le.fit_transform(df_case['Partisipasi'])
X_multi = df_case[['Jam_Belajar', 'Kehadiran', 'Tugas_Selesai', 'Partisipasi_encoded']]
y_multi = df_case['Nilai_Akhir']
model_multi = LinearRegression()
model_multi.fit(X_multi, y_multi)
print("\n" + "="*40)
print("3. MULTIPLE REGRESSION ANALYSIS")
print("="*40)
print("\nRegression Coefficients:")
for i, col in enumerate(X_multi.columns):
print(f" {col}: {model_multi.coef_[i]:.3f}")
print(f" Intercept: {model_multi.intercept_:.3f}")
# Model evaluation
y_pred_multi = model_multi.predict(X_multi)
r2_multi = r2_score(y_multi, y_pred_multi)
print(f"\nModel Performance:")
print(f" Rยฒ Score: {r2_multi:.4f}")
print(f" {r2_multi*100:.2f}% of variance in Nilai_Akhir is explained by the model")
# 5. Statistical Tests
print("\n" + "="*40)
print("4. STATISTICAL TESTS")
print("="*40)
# ANOVA for Partisipasi
groups = [df_case[df_case['Partisipasi'] == level]['Nilai_Akhir']
for level in ['Rendah', 'Sedang', 'Tinggi']]
f_stat, p_val = stats.f_oneway(*groups)
print(f"\nANOVA - Partisipasi effect on Nilai:")
print(f" F-statistic: {f_stat:.4f}")
print(f" p-value: {p_val:.4f}")
if p_val < 0.05:
print(" โ Reject Hโ: Participation level affects scores significantly")
else:
print(" โ
Fail to reject Hโ: No significant effect")
# 6. Key Insights
print("\n" + "="*40)
print("5. KEY INSIGHTS")
print("="*40)
print("๐ Based on statistical analysis:")
print("1. Jam Belajar has the strongest correlation with Nilai Akhir")
print("2. Students with >80% attendance score 10 points higher on average")
print("3. High participation students score 15% better than low participation")
print(f"4. The model explains {r2_multi*100:.1f}% of grade variance")
print("5. Top factors: Study hours > Attendance > Assignment completion")
๐งช Contoh Soal & Latihan
Latihan 1: Analisis Statistik Dasar
- Import dataset dengan minimal 100 observasi
- Hitung semua ukuran pemusatan dan penyebaran
- Uji normalitas dengan 3 metode berbeda
- Buat visualisasi histogram dengan kurva normal overlay
- Interpretasikan skewness dan kurtosis
Latihan 2: Korelasi dan Regresi
- Buat dataset dengan 3 variabel numerik
- Hitung korelasi Pearson, Spearman, dan Kendall
- Visualisasikan dengan scatter plot matrix
- Lakukan regresi linear sederhana
- Evaluasi asumsi regresi (linearity, normality, homoscedasticity)
- Hitung confidence interval untuk predictions
Latihan 3: Hypothesis Testing
- Formulasikan hipotesis untuk kasus bisnis
- Pilih uji statistik yang tepat
- Lakukan uji dengan ฮฑ = 0.05
- Interpretasikan p-value
- Buat kesimpulan dalam konteks bisnis
๐ก Tips Penting:
- Always check assumptions: Sebelum menggunakan metode statistik
- Visualize first: Grafik sering lebih informatif dari angka
- Context matters: Signifikansi statistik โ signifikansi praktis
- Sample size: Mempengaruhi power of test
- Multiple testing: Hati-hati dengan Type I error
- Report effect size: Bukan hanya p-value
โ๏ธ Kesimpulan
Pada pertemuan ini, mahasiswa telah mempelajari:
- โ Statistik deskriptif (central tendency & dispersion)
- โ Distribusi data dan uji normalitas
- โ Analisis korelasi multi-metode
- โ Regresi linear dengan evaluasi asumsi
- โ Hypothesis testing dan interpretasi
- โ Confidence intervals dan inference
Pertemuan selanjutnya: Regresi Linear Lanjutan