๐ Pertemuan 7 - Studi Kasus Mini: Analisis Data Nyata
๐ฏ Tujuan Pembelajaran
- Menerapkan seluruh konsep dari pertemuan 1-6 dalam analisis lengkap
- Melakukan analisis data eksploratif pada dataset publik nyata
- Menghasilkan visualisasi informatif dan insight yang actionable
- Menyusun laporan analisis terstruktur
- Menangani kompleksitas data real-world
๐งฉ Konsep Studi Kasus
Pada pertemuan ini, kita akan mengintegrasikan semua skill yang telah dipelajari:
- ๐ฅ Data Import & Setup - Persiapan environment dan loading data
- ๐ Exploratory Data Analysis - Memahami karakteristik data
- ๐งน Data Cleaning - Membersihkan dan menyiapkan data
- ๐ Visualization - Membuat grafik informatif
- ๐ Statistical Analysis - Analisis mendalam
- ๐ก Insights & Conclusions - Menarik kesimpulan
๐ Pilihan Dataset Studi Kasus
๐ฆ STUDI KASUS 1: Analisis COVID-19 Indonesia
๐ Deskripsi Kasus
Analisis perkembangan pandemi COVID-19 di Indonesia menggunakan data dari Johns Hopkins CSSE. Fokus pada tren kasus, recovery rate, dan perbandingan dengan negara ASEAN lainnya.
Step 1: Setup dan Import Data
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
# Setup style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
# Load COVID-19 data
url_confirmed = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
url_deaths = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
url_recovered = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv"
# Read data
df_confirmed = pd.read_csv(url_confirmed)
df_deaths = pd.read_csv(url_deaths)
df_recovered = pd.read_csv(url_recovered)
print("โ
Data berhasil di-load!")
print(f"Shape - Confirmed: {df_confirmed.shape}")
print(f"Shape - Deaths: {df_deaths.shape}")
print(f"Shape - Recovered: {df_recovered.shape}")
Step 2: Data Preprocessing
# Function untuk proses data per negara
def process_country_data(country_name):
# Filter by country
confirmed = df_confirmed[df_confirmed['Country/Region'] == country_name]
deaths = df_deaths[df_deaths['Country/Region'] == country_name]
recovered = df_recovered[df_recovered['Country/Region'] == country_name]
# Get dates columns (skip first 4 columns)
dates = df_confirmed.columns[4:]
# Aggregate if multiple provinces
confirmed_values = confirmed[dates].sum()
deaths_values = deaths[dates].sum()
recovered_values = recovered[dates].sum()
# Create dataframe
df_country = pd.DataFrame({
'Date': pd.to_datetime(dates),
'Confirmed': confirmed_values.values,
'Deaths': deaths_values.values,
'Recovered': recovered_values.values
})
# Calculate additional metrics
df_country['Active'] = df_country['Confirmed'] - df_country['Deaths'] - df_country['Recovered']
df_country['Daily_Confirmed'] = df_country['Confirmed'].diff().fillna(0)
df_country['Daily_Deaths'] = df_country['Deaths'].diff().fillna(0)
df_country['Death_Rate'] = (df_country['Deaths'] / df_country['Confirmed'] * 100).round(2)
df_country['Recovery_Rate'] = (df_country['Recovered'] / df_country['Confirmed'] * 100).round(2)
return df_country
# Process Indonesia data
df_indonesia = process_country_data('Indonesia')
print("\n๐ Indonesia COVID-19 Data Processed")
print(df_indonesia.tail())
print(f"\nTotal Days: {len(df_indonesia)}")
print(f"Date Range: {df_indonesia['Date'].min()} to {df_indonesia['Date'].max()}")
Step 3: Exploratory Data Analysis
# Statistical Summary
print("="*60)
print("๐ STATISTICAL SUMMARY - INDONESIA COVID-19")
print("="*60)
# Latest statistics
latest = df_indonesia.iloc[-1]
print(f"\n๐ด Latest Statistics ({latest['Date'].strftime('%Y-%m-%d')}):")
print(f" Total Confirmed: {latest['Confirmed']:,}")
print(f" Total Deaths: {latest['Deaths']:,}")
print(f" Total Recovered: {latest['Recovered']:,}")
print(f" Active Cases: {latest['Active']:,}")
print(f" Death Rate: {latest['Death_Rate']:.2f}%")
print(f" Recovery Rate: {latest['Recovery_Rate']:.2f}%")
# Peak statistics
print(f"\n๐ Peak Statistics:")
peak_daily = df_indonesia['Daily_Confirmed'].max()
peak_date = df_indonesia[df_indonesia['Daily_Confirmed'] == peak_daily]['Date'].values[0]
print(f" Highest Daily Cases: {peak_daily:,.0f} on {pd.to_datetime(peak_date).strftime('%Y-%m-%d')}")
peak_active = df_indonesia['Active'].max()
peak_active_date = df_indonesia[df_indonesia['Active'] == peak_active]['Date'].values[0]
print(f" Highest Active Cases: {peak_active:,.0f} on {pd.to_datetime(peak_active_date).strftime('%Y-%m-%d')}")
# Moving averages
df_indonesia['MA7_Confirmed'] = df_indonesia['Daily_Confirmed'].rolling(window=7).mean()
df_indonesia['MA30_Confirmed'] = df_indonesia['Daily_Confirmed'].rolling(window=30).mean()
print("\nโ
Analysis complete!")
Step 4: Visualisasi Komprehensif
# Create comprehensive dashboard
fig = plt.figure(figsize=(20, 12))
fig.suptitle('COVID-19 Dashboard - Indonesia', fontsize=20, fontweight='bold')
# 1. Total Cases Timeline
ax1 = plt.subplot(2, 3, 1)
ax1.plot(df_indonesia['Date'], df_indonesia['Confirmed'], label='Confirmed', linewidth=2, color='#FF6B6B')
ax1.plot(df_indonesia['Date'], df_indonesia['Recovered'], label='Recovered', linewidth=2, color='#4ECDC4')
ax1.plot(df_indonesia['Date'], df_indonesia['Deaths'], label='Deaths', linewidth=2, color='#45B7D1')
ax1.set_title('Cumulative Cases Over Time', fontsize=14, fontweight='bold')
ax1.set_xlabel('Date')
ax1.set_ylabel('Number of Cases')
ax1.legend()
ax1.grid(True, alpha=0.3)
# 2. Daily New Cases
ax2 = plt.subplot(2, 3, 2)
ax2.bar(df_indonesia['Date'], df_indonesia['Daily_Confirmed'], color='#FFA07A', alpha=0.5, label='Daily Cases')
ax2.plot(df_indonesia['Date'], df_indonesia['MA7_Confirmed'], color='red', linewidth=2, label='7-Day MA')
ax2.plot(df_indonesia['Date'], df_indonesia['MA30_Confirmed'], color='darkred', linewidth=2, label='30-Day MA')
ax2.set_title('Daily New Cases with Moving Averages', fontsize=14, fontweight='bold')
ax2.set_xlabel('Date')
ax2.set_ylabel('New Cases')
ax2.legend()
ax2.grid(True, alpha=0.3)
# 3. Active Cases
ax3 = plt.subplot(2, 3, 3)
ax3.fill_between(df_indonesia['Date'], df_indonesia['Active'], color='#FFD93D', alpha=0.6)
ax3.plot(df_indonesia['Date'], df_indonesia['Active'], color='#F39C12', linewidth=2)
ax3.set_title('Active Cases Over Time', fontsize=14, fontweight='bold')
ax3.set_xlabel('Date')
ax3.set_ylabel('Active Cases')
ax3.grid(True, alpha=0.3)
# 4. Death and Recovery Rates
ax4 = plt.subplot(2, 3, 4)
ax4.plot(df_indonesia['Date'], df_indonesia['Death_Rate'], label='Death Rate (%)', linewidth=2, color='#E74C3C')
ax4.plot(df_indonesia['Date'], df_indonesia['Recovery_Rate'], label='Recovery Rate (%)', linewidth=2, color='#27AE60')
ax4.set_title('Death Rate vs Recovery Rate', fontsize=14, fontweight='bold')
ax4.set_xlabel('Date')
ax4.set_ylabel('Percentage (%)')
ax4.legend()
ax4.grid(True, alpha=0.3)
# 5. Monthly Aggregation
df_indonesia['Month'] = df_indonesia['Date'].dt.to_period('M')
monthly = df_indonesia.groupby('Month').agg({
'Daily_Confirmed': 'sum',
'Daily_Deaths': 'sum'
}).tail(12)
ax5 = plt.subplot(2, 3, 5)
x = range(len(monthly))
width = 0.35
ax5.bar([i - width/2 for i in x], monthly['Daily_Confirmed'], width, label='Confirmed', color='#3498DB')
ax5.bar([i + width/2 for i in x], monthly['Daily_Deaths'], width, label='Deaths', color='#E74C3C')
ax5.set_title('Monthly Cases (Last 12 Months)', fontsize=14, fontweight='bold')
ax5.set_xticks(x)
ax5.set_xticklabels([str(m) for m in monthly.index], rotation=45)
ax5.legend()
ax5.grid(True, alpha=0.3)
# 6. Case Distribution Pie Chart
ax6 = plt.subplot(2, 3, 6)
latest_data = df_indonesia.iloc[-1]
sizes = [latest_data['Recovered'], latest_data['Deaths'], latest_data['Active']]
labels = ['Recovered', 'Deaths', 'Active']
colors = ['#27AE60', '#E74C3C', '#F39C12']
explode = (0.1, 0, 0)
ax6.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%',
shadow=True, startangle=90)
ax6.set_title('Current Case Distribution', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
print("\n๐ Visualizations complete!")
Step 5: Analisis Komparatif ASEAN
# Compare with ASEAN countries
asean_countries = ['Indonesia', 'Malaysia', 'Singapore', 'Thailand', 'Philippines', 'Vietnam']
# Process all ASEAN data
asean_data = {}
for country in asean_countries:
try:
asean_data[country] = process_country_data(country)
except:
print(f"โ ๏ธ Data not available for {country}")
# Create comparison dataframe
comparison_df = pd.DataFrame()
for country, data in asean_data.items():
latest = data.iloc[-1]
comparison_df = pd.concat([comparison_df, pd.DataFrame({
'Country': [country],
'Total_Cases': [latest['Confirmed']],
'Total_Deaths': [latest['Deaths']],
'Death_Rate': [latest['Death_Rate']],
'Recovery_Rate': [latest['Recovery_Rate']]
})], ignore_index=True)
# Sort by total cases
comparison_df = comparison_df.sort_values('Total_Cases', ascending=False)
# Visualize comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('ASEAN COVID-19 Comparison', fontsize=16, fontweight='bold')
# Total Cases
axes[0, 0].barh(comparison_df['Country'], comparison_df['Total_Cases'], color='skyblue')
axes[0, 0].set_title('Total Confirmed Cases')
axes[0, 0].set_xlabel('Cases')
# Death Rate
axes[0, 1].barh(comparison_df['Country'], comparison_df['Death_Rate'], color='salmon')
axes[0, 1].set_title('Death Rate (%)')
axes[0, 1].set_xlabel('Percentage')
# Recovery Rate
axes[1, 0].barh(comparison_df['Country'], comparison_df['Recovery_Rate'], color='lightgreen')
axes[1, 0].set_title('Recovery Rate (%)')
axes[1, 0].set_xlabel('Percentage')
# Cases per Death ratio
comparison_df['Cases_per_Death'] = comparison_df['Total_Cases'] / comparison_df['Total_Deaths']
axes[1, 1].barh(comparison_df['Country'], comparison_df['Cases_per_Death'], color='gold')
axes[1, 1].set_title('Cases per Death (Higher is Better)')
axes[1, 1].set_xlabel('Ratio')
plt.tight_layout()
plt.show()
print("\n๐ ASEAN Comparison:")
print(comparison_df.to_string(index=False))
๐ STUDI KASUS 2: Analisis Saham BBRI
๐ Deskripsi Kasus
Analisis pergerakan saham Bank Rakyat Indonesia (BBRI) dengan fokus pada tren harga, volume trading, dan indikator teknikal seperti Moving Average dan RSI.
# Install yfinance if needed
!pip install yfinance -q
import yfinance as yf
from datetime import datetime, timedelta
# Download BBRI stock data
ticker = "BBRI.JK"
start_date = "2020-01-01"
end_date = datetime.now().strftime("%Y-%m-%d")
# Fetch data
stock = yf.download(ticker, start=start_date, end=end_date, progress=False)
print(f"โ
Data saham {ticker} berhasil diunduh")
print(f"Period: {start_date} to {end_date}")
print(f"Total trading days: {len(stock)}")
# Data preprocessing
stock['Returns'] = stock['Close'].pct_change()
stock['MA20'] = stock['Close'].rolling(window=20).mean()
stock['MA50'] = stock['Close'].rolling(window=50).mean()
stock['MA200'] = stock['Close'].rolling(window=200).mean()
# Volatility (20-day rolling std)
stock['Volatility'] = stock['Returns'].rolling(window=20).std()
# RSI Calculation
def calculate_rsi(data, window=14):
delta = data.diff()
gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
rs = gain / loss
rsi = 100 - (100 / (1 + rs))
return rsi
stock['RSI'] = calculate_rsi(stock['Close'])
# Display latest data
print("\n๐ Latest Stock Data:")
print(stock[['Close', 'Volume', 'Returns', 'MA20', 'MA50', 'RSI']].tail())
Visualisasi Saham BBRI
# Create stock analysis dashboard
fig = plt.figure(figsize=(20, 12))
fig.suptitle(f'Stock Analysis Dashboard - {ticker}', fontsize=20, fontweight='bold')
# 1. Price and Moving Averages
ax1 = plt.subplot(3, 2, 1)
ax1.plot(stock.index, stock['Close'], label='Close Price', linewidth=1, color='black')
ax1.plot(stock.index, stock['MA20'], label='MA20', linewidth=1, color='blue', alpha=0.7)
ax1.plot(stock.index, stock['MA50'], label='MA50', linewidth=1, color='orange', alpha=0.7)
ax1.plot(stock.index, stock['MA200'], label='MA200', linewidth=1, color='red', alpha=0.7)
ax1.set_title('Stock Price with Moving Averages', fontsize=14, fontweight='bold')
ax1.set_xlabel('Date')
ax1.set_ylabel('Price (IDR)')
ax1.legend()
ax1.grid(True, alpha=0.3)
# 2. Volume
ax2 = plt.subplot(3, 2, 2)
ax2.bar(stock.index, stock['Volume'], color='steelblue', alpha=0.6)
ax2.set_title('Trading Volume', fontsize=14, fontweight='bold')
ax2.set_xlabel('Date')
ax2.set_ylabel('Volume')
ax2.grid(True, alpha=0.3)
# 3. Daily Returns Distribution
ax3 = plt.subplot(3, 2, 3)
ax3.hist(stock['Returns'].dropna(), bins=50, color='green', alpha=0.7, edgecolor='black')
ax3.axvline(stock['Returns'].mean(), color='red', linestyle='--', label=f"Mean: {stock['Returns'].mean():.4f}")
ax3.set_title('Daily Returns Distribution', fontsize=14, fontweight='bold')
ax3.set_xlabel('Returns')
ax3.set_ylabel('Frequency')
ax3.legend()
ax3.grid(True, alpha=0.3)
# 4. RSI Indicator
ax4 = plt.subplot(3, 2, 4)
ax4.plot(stock.index, stock['RSI'], color='purple', linewidth=1.5)
ax4.axhline(70, color='red', linestyle='--', alpha=0.5, label='Overbought')
ax4.axhline(30, color='green', linestyle='--', alpha=0.5, label='Oversold')
ax4.fill_between(stock.index, 30, 70, alpha=0.1, color='gray')
ax4.set_title('RSI (Relative Strength Index)', fontsize=14, fontweight='bold')
ax4.set_xlabel('Date')
ax4.set_ylabel('RSI')
ax4.set_ylim(0, 100)
ax4.legend()
ax4.grid(True, alpha=0.3)
# 5. Volatility over Time
ax5 = plt.subplot(3, 2, 5)
ax5.plot(stock.index, stock['Volatility'], color='red', linewidth=1.5)
ax5.fill_between(stock.index, stock['Volatility'], alpha=0.3, color='red')
ax5.set_title('20-Day Rolling Volatility', fontsize=14, fontweight='bold')
ax5.set_xlabel('Date')
ax5.set_ylabel('Volatility')
ax5.grid(True, alpha=0.3)
# 6. Yearly Performance
stock['Year'] = stock.index.year
yearly_returns = stock.groupby('Year')['Returns'].agg(['mean', 'std', 'sum'])
yearly_returns['Annual_Return'] = (1 + yearly_returns['sum']) - 1
ax6 = plt.subplot(3, 2, 6)
ax6.bar(yearly_returns.index, yearly_returns['Annual_Return'] * 100, color=['green' if x > 0 else 'red' for x in yearly_returns['Annual_Return']])
ax6.set_title('Annual Returns (%)', fontsize=14, fontweight='bold')
ax6.set_xlabel('Year')
ax6.set_ylabel('Return (%)')
ax6.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Summary Statistics
print("\n๐ Stock Performance Summary:")
print("="*50)
print(f"Current Price: IDR {stock['Close'].iloc[-1]:,.0f}")
print(f"52-Week High: IDR {stock['Close'].rolling(252).max().iloc[-1]:,.0f}")
print(f"52-Week Low: IDR {stock['Close'].rolling(252).min().iloc[-1]:,.0f}")
print(f"Average Daily Return: {stock['Returns'].mean():.4%}")
print(f"Volatility (Annualized): {stock['Returns'].std() * np.sqrt(252):.2%}")
print(f"Sharpe Ratio: {(stock['Returns'].mean() / stock['Returns'].std()) * np.sqrt(252):.2f}")
๐ ๏ธ Pipeline Analisis Lengkap
๐ Template Workflow Analisis Data
- Data Collection: Import dari sumber terpercaya
- Data Cleaning: Handle missing values, outliers
- Feature Engineering: Create new meaningful features
- Exploratory Analysis: Statistical summary & patterns
- Visualization: Multiple perspectives & insights
- Interpretation: Business/domain context
- Documentation: Clear reporting & recommendations
# TEMPLATE: Complete Analysis Pipeline
class DataAnalysisPipeline:
"""Template for systematic data analysis"""
def __init__(self, data_source):
self.data_source = data_source
self.df = None
self.results = {}
def load_data(self):
"""Step 1: Load and validate data"""
print("๐ฅ Loading data...")
# Implementation here
return self
def clean_data(self):
"""Step 2: Clean and preprocess"""
print("๐งน Cleaning data...")
# Handle missing values
# Remove duplicates
# Fix data types
return self
def explore_data(self):
"""Step 3: Exploratory analysis"""
print("๐ Exploring data...")
# Statistical summary
# Correlation analysis
# Distribution checks
return self
def visualize_insights(self):
"""Step 4: Create visualizations"""
print("๐ Creating visualizations...")
# Time series plots
# Distributions
# Comparisons
return self
def generate_report(self):
"""Step 5: Generate final report"""
print("๐ Generating report...")
# Summary statistics
# Key findings
# Recommendations
return self
# Usage example
pipeline = DataAnalysisPipeline("covid_data.csv")
pipeline.load_data() \
.clean_data() \
.explore_data() \
.visualize_insights() \
.generate_report()
๐ Contoh Soal dan Latihan
Latihan 1: Analisis COVID-19 Regional
- Download data COVID-19 untuk provinsi di Indonesia
- Identifikasi 5 provinsi dengan kasus tertinggi
- Hitung dan visualisasikan trend mingguan
- Bandingkan death rate antar provinsi
- Buat dashboard interaktif dengan minimal 4 visualisasi
Latihan 2: Analisis Portfolio Saham
- Pilih 5 saham dari sektor berbeda (BBCA, TLKM, UNVR, ASII, ICBP)
- Download data 2 tahun terakhir
- Hitung correlation matrix antar saham
- Visualisasikan performance comparison
- Hitung portfolio return dengan equal weight
- Identifikasi best dan worst performer
Latihan 3: E-Commerce Analysis
- Generate synthetic e-commerce data (1000+ transactions)
- Analyze customer segmentation (RFM analysis)
- Identify top products and categories
- Calculate conversion rates
- Create sales forecasting model
๐ฏ Tips untuk Analisis Sukses:
- Start with questions: Tentukan pertanyaan bisnis yang ingin dijawab
- Know your data: Pahami konteks dan limitasi data
- Iterate quickly: Mulai sederhana, tingkatkan kompleksitas bertahap
- Validate findings: Cross-check dengan sumber lain
- Tell a story: Susun narasi yang logis dan mudah dipahami
- Be actionable: Berikan rekomendasi konkrit
๐ Format Laporan Analisis
Template Laporan Mini Project
# LAPORAN ANALISIS DATA [JUDUL]
## 1. EXECUTIVE SUMMARY
- Key findings (3-5 bullet points)
- Main recommendations
## 2. INTRODUCTION
- Background & context
- Objectives
- Data sources
## 3. METHODOLOGY
- Data collection process
- Cleaning steps taken
- Analysis techniques used
## 4. FINDINGS
### 4.1 Descriptive Statistics
- Summary tables
- Key metrics
### 4.2 Trend Analysis
- Time series patterns
- Seasonality
### 4.3 Comparative Analysis
- Benchmarking
- Segmentation
## 5. VISUALIZATIONS
- Include 5-8 key charts
- Clear labels and legends
## 6. INSIGHTS & RECOMMENDATIONS
- Business implications
- Action items
- Future research
## 7. APPENDIX
- Technical details
- Code snippets
- Data dictionary
๐ฏ Persiapan UTS
๐ Checklist Kemampuan untuk UTS
- โ Import dan load berbagai format data
- โ Identifikasi dan handle missing values
- โ Melakukan data cleaning sistematis
- โ Membuat minimal 5 jenis visualisasi
- โ Menghitung statistik deskriptif
- โ Menginterpretasi hasil analisis
- โ Menyusun laporan terstruktur
- โ Memberikan insight actionable
โ๏ธ Kesimpulan
Pada pertemuan ini, mahasiswa telah mempelajari:
- โ Implementasi analisis data end-to-end
- โ Handling dataset real-world yang kompleks
- โ Teknik visualisasi advanced
- โ Comparative analysis antar entitas
- โ Pipeline analisis yang reusable
- โ Best practices dalam reporting
Pertemuan selanjutnya: UTS - Proyek analisis data mandiri