📘 Pertemuan 7 - Studi Kasus Mini: Analisis Data Nyata

🎯 Tujuan Pembelajaran

Menerapkan seluruh konsep dari pertemuan 1-6 dalam analisis lengkap
Melakukan analisis data eksploratif pada dataset publik nyata
Menghasilkan visualisasi informatif dan insight yang actionable
Menyusun laporan analisis terstruktur
Menangani kompleksitas data real-world

🧩 Konsep Studi Kasus

Pada pertemuan ini, kita akan mengintegrasikan semua skill yang telah dipelajari:

📥 Data Import & Setup - Persiapan environment dan loading data
🔍 Exploratory Data Analysis - Memahami karakteristik data
🧹 Data Cleaning - Membersihkan dan menyiapkan data
📊 Visualization - Membuat grafik informatif
📈 Statistical Analysis - Analisis mendalam
💡 Insights & Conclusions - Menarik kesimpulan

📁 Pilihan Dataset Studi Kasus

🦠 STUDI KASUS 1: Analisis COVID-19 Indonesia

📋 Deskripsi Kasus

Analisis perkembangan pandemi COVID-19 di Indonesia menggunakan data dari Johns Hopkins CSSE. Fokus pada tren kasus, recovery rate, dan perbandingan dengan negara ASEAN lainnya.

Step 1: Setup dan Import Data

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Setup style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Load COVID-19 data
url_confirmed = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
url_deaths = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
url_recovered = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv"

# Read data
df_confirmed = pd.read_csv(url_confirmed)
df_deaths = pd.read_csv(url_deaths)
df_recovered = pd.read_csv(url_recovered)

print("✅ Data berhasil di-load!")
print(f"Shape - Confirmed: {df_confirmed.shape}")
print(f"Shape - Deaths: {df_deaths.shape}")
print(f"Shape - Recovered: {df_recovered.shape}")

Step 2: Data Preprocessing

# Function untuk proses data per negara
def process_country_data(country_name):
    # Filter by country
    confirmed = df_confirmed[df_confirmed['Country/Region'] == country_name]
    deaths = df_deaths[df_deaths['Country/Region'] == country_name]
    recovered = df_recovered[df_recovered['Country/Region'] == country_name]
    
    # Get dates columns (skip first 4 columns)
    dates = df_confirmed.columns[4:]
    
    # Aggregate if multiple provinces
    confirmed_values = confirmed[dates].sum()
    deaths_values = deaths[dates].sum()
    recovered_values = recovered[dates].sum()
    
    # Create dataframe
    df_country = pd.DataFrame({
        'Date': pd.to_datetime(dates),
        'Confirmed': confirmed_values.values,
        'Deaths': deaths_values.values,
        'Recovered': recovered_values.values
    })
    
    # Calculate additional metrics
    df_country['Active'] = df_country['Confirmed'] - df_country['Deaths'] - df_country['Recovered']
    df_country['Daily_Confirmed'] = df_country['Confirmed'].diff().fillna(0)
    df_country['Daily_Deaths'] = df_country['Deaths'].diff().fillna(0)
    df_country['Death_Rate'] = (df_country['Deaths'] / df_country['Confirmed'] * 100).round(2)
    df_country['Recovery_Rate'] = (df_country['Recovered'] / df_country['Confirmed'] * 100).round(2)
    
    return df_country

# Process Indonesia data
df_indonesia = process_country_data('Indonesia')
print("\n📊 Indonesia COVID-19 Data Processed")
print(df_indonesia.tail())
print(f"\nTotal Days: {len(df_indonesia)}")
print(f"Date Range: {df_indonesia['Date'].min()} to {df_indonesia['Date'].max()}")

Step 3: Exploratory Data Analysis

# Statistical Summary
print("="*60)
print("📈 STATISTICAL SUMMARY - INDONESIA COVID-19")
print("="*60)

# Latest statistics
latest = df_indonesia.iloc[-1]
print(f"\n🔴 Latest Statistics ({latest['Date'].strftime('%Y-%m-%d')}):")
print(f"  Total Confirmed: {latest['Confirmed']:,}")
print(f"  Total Deaths: {latest['Deaths']:,}")
print(f"  Total Recovered: {latest['Recovered']:,}")
print(f"  Active Cases: {latest['Active']:,}")
print(f"  Death Rate: {latest['Death_Rate']:.2f}%")
print(f"  Recovery Rate: {latest['Recovery_Rate']:.2f}%")

# Peak statistics
print(f"\n📊 Peak Statistics:")
peak_daily = df_indonesia['Daily_Confirmed'].max()
peak_date = df_indonesia[df_indonesia['Daily_Confirmed'] == peak_daily]['Date'].values[0]
print(f"  Highest Daily Cases: {peak_daily:,.0f} on {pd.to_datetime(peak_date).strftime('%Y-%m-%d')}")

peak_active = df_indonesia['Active'].max()
peak_active_date = df_indonesia[df_indonesia['Active'] == peak_active]['Date'].values[0]
print(f"  Highest Active Cases: {peak_active:,.0f} on {pd.to_datetime(peak_active_date).strftime('%Y-%m-%d')}")

# Moving averages
df_indonesia['MA7_Confirmed'] = df_indonesia['Daily_Confirmed'].rolling(window=7).mean()
df_indonesia['MA30_Confirmed'] = df_indonesia['Daily_Confirmed'].rolling(window=30).mean()

print("\n✅ Analysis complete!")

Step 4: Visualisasi Komprehensif

# Create comprehensive dashboard
fig = plt.figure(figsize=(20, 12))
fig.suptitle('COVID-19 Dashboard - Indonesia', fontsize=20, fontweight='bold')

# 1. Total Cases Timeline
ax1 = plt.subplot(2, 3, 1)
ax1.plot(df_indonesia['Date'], df_indonesia['Confirmed'], label='Confirmed', linewidth=2, color='#FF6B6B')
ax1.plot(df_indonesia['Date'], df_indonesia['Recovered'], label='Recovered', linewidth=2, color='#4ECDC4')
ax1.plot(df_indonesia['Date'], df_indonesia['Deaths'], label='Deaths', linewidth=2, color='#45B7D1')
ax1.set_title('Cumulative Cases Over Time', fontsize=14, fontweight='bold')
ax1.set_xlabel('Date')
ax1.set_ylabel('Number of Cases')
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. Daily New Cases
ax2 = plt.subplot(2, 3, 2)
ax2.bar(df_indonesia['Date'], df_indonesia['Daily_Confirmed'], color='#FFA07A', alpha=0.5, label='Daily Cases')
ax2.plot(df_indonesia['Date'], df_indonesia['MA7_Confirmed'], color='red', linewidth=2, label='7-Day MA')
ax2.plot(df_indonesia['Date'], df_indonesia['MA30_Confirmed'], color='darkred', linewidth=2, label='30-Day MA')
ax2.set_title('Daily New Cases with Moving Averages', fontsize=14, fontweight='bold')
ax2.set_xlabel('Date')
ax2.set_ylabel('New Cases')
ax2.legend()
ax2.grid(True, alpha=0.3)

# 3. Active Cases
ax3 = plt.subplot(2, 3, 3)
ax3.fill_between(df_indonesia['Date'], df_indonesia['Active'], color='#FFD93D', alpha=0.6)
ax3.plot(df_indonesia['Date'], df_indonesia['Active'], color='#F39C12', linewidth=2)
ax3.set_title('Active Cases Over Time', fontsize=14, fontweight='bold')
ax3.set_xlabel('Date')
ax3.set_ylabel('Active Cases')
ax3.grid(True, alpha=0.3)

# 4. Death and Recovery Rates
ax4 = plt.subplot(2, 3, 4)
ax4.plot(df_indonesia['Date'], df_indonesia['Death_Rate'], label='Death Rate (%)', linewidth=2, color='#E74C3C')
ax4.plot(df_indonesia['Date'], df_indonesia['Recovery_Rate'], label='Recovery Rate (%)', linewidth=2, color='#27AE60')
ax4.set_title('Death Rate vs Recovery Rate', fontsize=14, fontweight='bold')
ax4.set_xlabel('Date')
ax4.set_ylabel('Percentage (%)')
ax4.legend()
ax4.grid(True, alpha=0.3)

# 5. Monthly Aggregation
df_indonesia['Month'] = df_indonesia['Date'].dt.to_period('M')
monthly = df_indonesia.groupby('Month').agg({
    'Daily_Confirmed': 'sum',
    'Daily_Deaths': 'sum'
}).tail(12)

ax5 = plt.subplot(2, 3, 5)
x = range(len(monthly))
width = 0.35
ax5.bar([i - width/2 for i in x], monthly['Daily_Confirmed'], width, label='Confirmed', color='#3498DB')
ax5.bar([i + width/2 for i in x], monthly['Daily_Deaths'], width, label='Deaths', color='#E74C3C')
ax5.set_title('Monthly Cases (Last 12 Months)', fontsize=14, fontweight='bold')
ax5.set_xticks(x)
ax5.set_xticklabels([str(m) for m in monthly.index], rotation=45)
ax5.legend()
ax5.grid(True, alpha=0.3)

# 6. Case Distribution Pie Chart
ax6 = plt.subplot(2, 3, 6)
latest_data = df_indonesia.iloc[-1]
sizes = [latest_data['Recovered'], latest_data['Deaths'], latest_data['Active']]
labels = ['Recovered', 'Deaths', 'Active']
colors = ['#27AE60', '#E74C3C', '#F39C12']
explode = (0.1, 0, 0)

ax6.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax6.set_title('Current Case Distribution', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print("\n📊 Visualizations complete!")

Step 5: Analisis Komparatif ASEAN

# Compare with ASEAN countries
asean_countries = ['Indonesia', 'Malaysia', 'Singapore', 'Thailand', 'Philippines', 'Vietnam']

# Process all ASEAN data
asean_data = {}
for country in asean_countries:
    try:
        asean_data[country] = process_country_data(country)
    except:
        print(f"⚠️ Data not available for {country}")

# Create comparison dataframe
comparison_df = pd.DataFrame()
for country, data in asean_data.items():
    latest = data.iloc[-1]
    comparison_df = pd.concat([comparison_df, pd.DataFrame({
        'Country': [country],
        'Total_Cases': [latest['Confirmed']],
        'Total_Deaths': [latest['Deaths']],
        'Death_Rate': [latest['Death_Rate']],
        'Recovery_Rate': [latest['Recovery_Rate']]
    })], ignore_index=True)

# Sort by total cases
comparison_df = comparison_df.sort_values('Total_Cases', ascending=False)

# Visualize comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('ASEAN COVID-19 Comparison', fontsize=16, fontweight='bold')

# Total Cases
axes[0, 0].barh(comparison_df['Country'], comparison_df['Total_Cases'], color='skyblue')
axes[0, 0].set_title('Total Confirmed Cases')
axes[0, 0].set_xlabel('Cases')

# Death Rate
axes[0, 1].barh(comparison_df['Country'], comparison_df['Death_Rate'], color='salmon')
axes[0, 1].set_title('Death Rate (%)')
axes[0, 1].set_xlabel('Percentage')

# Recovery Rate
axes[1, 0].barh(comparison_df['Country'], comparison_df['Recovery_Rate'], color='lightgreen')
axes[1, 0].set_title('Recovery Rate (%)')
axes[1, 0].set_xlabel('Percentage')

# Cases per Death ratio
comparison_df['Cases_per_Death'] = comparison_df['Total_Cases'] / comparison_df['Total_Deaths']
axes[1, 1].barh(comparison_df['Country'], comparison_df['Cases_per_Death'], color='gold')
axes[1, 1].set_title('Cases per Death (Higher is Better)')
axes[1, 1].set_xlabel('Ratio')

plt.tight_layout()
plt.show()

print("\n📊 ASEAN Comparison:")
print(comparison_df.to_string(index=False))

📈 STUDI KASUS 2: Analisis Saham BBRI

📋 Deskripsi Kasus

Analisis pergerakan saham Bank Rakyat Indonesia (BBRI) dengan fokus pada tren harga, volume trading, dan indikator teknikal seperti Moving Average dan RSI.

# Install yfinance if needed
!pip install yfinance -q

import yfinance as yf
from datetime import datetime, timedelta

# Download BBRI stock data
ticker = "BBRI.JK"
start_date = "2020-01-01"
end_date = datetime.now().strftime("%Y-%m-%d")

# Fetch data
stock = yf.download(ticker, start=start_date, end=end_date, progress=False)
print(f"✅ Data saham {ticker} berhasil diunduh")
print(f"Period: {start_date} to {end_date}")
print(f"Total trading days: {len(stock)}")

# Data preprocessing
stock['Returns'] = stock['Close'].pct_change()
stock['MA20'] = stock['Close'].rolling(window=20).mean()
stock['MA50'] = stock['Close'].rolling(window=50).mean()
stock['MA200'] = stock['Close'].rolling(window=200).mean()

# Volatility (20-day rolling std)
stock['Volatility'] = stock['Returns'].rolling(window=20).std()

# RSI Calculation
def calculate_rsi(data, window=14):
    delta = data.diff()
    gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
    rs = gain / loss
    rsi = 100 - (100 / (1 + rs))
    return rsi

stock['RSI'] = calculate_rsi(stock['Close'])

# Display latest data
print("\n📊 Latest Stock Data:")
print(stock[['Close', 'Volume', 'Returns', 'MA20', 'MA50', 'RSI']].tail())

Visualisasi Saham BBRI

# Create stock analysis dashboard
fig = plt.figure(figsize=(20, 12))
fig.suptitle(f'Stock Analysis Dashboard - {ticker}', fontsize=20, fontweight='bold')

# 1. Price and Moving Averages
ax1 = plt.subplot(3, 2, 1)
ax1.plot(stock.index, stock['Close'], label='Close Price', linewidth=1, color='black')
ax1.plot(stock.index, stock['MA20'], label='MA20', linewidth=1, color='blue', alpha=0.7)
ax1.plot(stock.index, stock['MA50'], label='MA50', linewidth=1, color='orange', alpha=0.7)
ax1.plot(stock.index, stock['MA200'], label='MA200', linewidth=1, color='red', alpha=0.7)
ax1.set_title('Stock Price with Moving Averages', fontsize=14, fontweight='bold')
ax1.set_xlabel('Date')
ax1.set_ylabel('Price (IDR)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. Volume
ax2 = plt.subplot(3, 2, 2)
ax2.bar(stock.index, stock['Volume'], color='steelblue', alpha=0.6)
ax2.set_title('Trading Volume', fontsize=14, fontweight='bold')
ax2.set_xlabel('Date')
ax2.set_ylabel('Volume')
ax2.grid(True, alpha=0.3)

# 3. Daily Returns Distribution
ax3 = plt.subplot(3, 2, 3)
ax3.hist(stock['Returns'].dropna(), bins=50, color='green', alpha=0.7, edgecolor='black')
ax3.axvline(stock['Returns'].mean(), color='red', linestyle='--', label=f"Mean: {stock['Returns'].mean():.4f}")
ax3.set_title('Daily Returns Distribution', fontsize=14, fontweight='bold')
ax3.set_xlabel('Returns')
ax3.set_ylabel('Frequency')
ax3.legend()
ax3.grid(True, alpha=0.3)

# 4. RSI Indicator
ax4 = plt.subplot(3, 2, 4)
ax4.plot(stock.index, stock['RSI'], color='purple', linewidth=1.5)
ax4.axhline(70, color='red', linestyle='--', alpha=0.5, label='Overbought')
ax4.axhline(30, color='green', linestyle='--', alpha=0.5, label='Oversold')
ax4.fill_between(stock.index, 30, 70, alpha=0.1, color='gray')
ax4.set_title('RSI (Relative Strength Index)', fontsize=14, fontweight='bold')
ax4.set_xlabel('Date')
ax4.set_ylabel('RSI')
ax4.set_ylim(0, 100)
ax4.legend()
ax4.grid(True, alpha=0.3)

# 5. Volatility over Time
ax5 = plt.subplot(3, 2, 5)
ax5.plot(stock.index, stock['Volatility'], color='red', linewidth=1.5)
ax5.fill_between(stock.index, stock['Volatility'], alpha=0.3, color='red')
ax5.set_title('20-Day Rolling Volatility', fontsize=14, fontweight='bold')
ax5.set_xlabel('Date')
ax5.set_ylabel('Volatility')
ax5.grid(True, alpha=0.3)

# 6. Yearly Performance
stock['Year'] = stock.index.year
yearly_returns = stock.groupby('Year')['Returns'].agg(['mean', 'std', 'sum'])
yearly_returns['Annual_Return'] = (1 + yearly_returns['sum']) - 1

ax6 = plt.subplot(3, 2, 6)
ax6.bar(yearly_returns.index, yearly_returns['Annual_Return'] * 100, color=['green' if x > 0 else 'red' for x in yearly_returns['Annual_Return']])
ax6.set_title('Annual Returns (%)', fontsize=14, fontweight='bold')
ax6.set_xlabel('Year')
ax6.set_ylabel('Return (%)')
ax6.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Summary Statistics
print("\n📈 Stock Performance Summary:")
print("="*50)
print(f"Current Price: IDR {stock['Close'].iloc[-1]:,.0f}")
print(f"52-Week High: IDR {stock['Close'].rolling(252).max().iloc[-1]:,.0f}")
print(f"52-Week Low: IDR {stock['Close'].rolling(252).min().iloc[-1]:,.0f}")
print(f"Average Daily Return: {stock['Returns'].mean():.4%}")
print(f"Volatility (Annualized): {stock['Returns'].std() * np.sqrt(252):.2%}")
print(f"Sharpe Ratio: {(stock['Returns'].mean() / stock['Returns'].std()) * np.sqrt(252):.2f}")

🛠️ Pipeline Analisis Lengkap

📝 Template Workflow Analisis Data

Data Collection: Import dari sumber terpercaya
Data Cleaning: Handle missing values, outliers
Feature Engineering: Create new meaningful features
Exploratory Analysis: Statistical summary & patterns
Visualization: Multiple perspectives & insights
Interpretation: Business/domain context
Documentation: Clear reporting & recommendations

# TEMPLATE: Complete Analysis Pipeline
class DataAnalysisPipeline:
    """Template for systematic data analysis"""
    
    def __init__(self, data_source):
        self.data_source = data_source
        self.df = None
        self.results = {}
    
    def load_data(self):
        """Step 1: Load and validate data"""
        print("📥 Loading data...")
        # Implementation here
        return self
    
    def clean_data(self):
        """Step 2: Clean and preprocess"""
        print("🧹 Cleaning data...")
        # Handle missing values
        # Remove duplicates
        # Fix data types
        return self
    
    def explore_data(self):
        """Step 3: Exploratory analysis"""
        print("🔍 Exploring data...")
        # Statistical summary
        # Correlation analysis
        # Distribution checks
        return self
    
    def visualize_insights(self):
        """Step 4: Create visualizations"""
        print("📊 Creating visualizations...")
        # Time series plots
        # Distributions
        # Comparisons
        return self
    
    def generate_report(self):
        """Step 5: Generate final report"""
        print("📝 Generating report...")
        # Summary statistics
        # Key findings
        # Recommendations
        return self

# Usage example
pipeline = DataAnalysisPipeline("covid_data.csv")
pipeline.load_data() \
        .clean_data() \
        .explore_data() \
        .visualize_insights() \
        .generate_report()

📊 Contoh Soal dan Latihan

Latihan 1: Analisis COVID-19 Regional

Download data COVID-19 untuk provinsi di Indonesia
Identifikasi 5 provinsi dengan kasus tertinggi
Hitung dan visualisasikan trend mingguan
Bandingkan death rate antar provinsi
Buat dashboard interaktif dengan minimal 4 visualisasi

Latihan 2: Analisis Portfolio Saham

Pilih 5 saham dari sektor berbeda (BBCA, TLKM, UNVR, ASII, ICBP)
Download data 2 tahun terakhir
Hitung correlation matrix antar saham
Visualisasikan performance comparison
Hitung portfolio return dengan equal weight
Identifikasi best dan worst performer

Latihan 3: E-Commerce Analysis

Generate synthetic e-commerce data (1000+ transactions)
Analyze customer segmentation (RFM analysis)
Identify top products and categories
Calculate conversion rates
Create sales forecasting model

🎯 Tips untuk Analisis Sukses:

Start with questions: Tentukan pertanyaan bisnis yang ingin dijawab
Know your data: Pahami konteks dan limitasi data
Iterate quickly: Mulai sederhana, tingkatkan kompleksitas bertahap
Validate findings: Cross-check dengan sumber lain
Tell a story: Susun narasi yang logis dan mudah dipahami
Be actionable: Berikan rekomendasi konkrit

📝 Format Laporan Analisis

Template Laporan Mini Project

# LAPORAN ANALISIS DATA [JUDUL]

## 1. EXECUTIVE SUMMARY
- Key findings (3-5 bullet points)
- Main recommendations

## 2. INTRODUCTION
- Background & context
- Objectives
- Data sources

## 3. METHODOLOGY
- Data collection process
- Cleaning steps taken
- Analysis techniques used

## 4. FINDINGS
### 4.1 Descriptive Statistics
- Summary tables
- Key metrics

### 4.2 Trend Analysis
- Time series patterns
- Seasonality

### 4.3 Comparative Analysis
- Benchmarking
- Segmentation

## 5. VISUALIZATIONS
- Include 5-8 key charts
- Clear labels and legends

## 6. INSIGHTS & RECOMMENDATIONS
- Business implications
- Action items
- Future research

## 7. APPENDIX
- Technical details
- Code snippets
- Data dictionary

🎯 Persiapan UTS

📚 Checklist Kemampuan untuk UTS

✅ Import dan load berbagai format data
✅ Identifikasi dan handle missing values
✅ Melakukan data cleaning sistematis
✅ Membuat minimal 5 jenis visualisasi
✅ Menghitung statistik deskriptif
✅ Menginterpretasi hasil analisis
✅ Menyusun laporan terstruktur
✅ Memberikan insight actionable

✍️ Kesimpulan

Pada pertemuan ini, mahasiswa telah mempelajari:

✅ Implementasi analisis data end-to-end
✅ Handling dataset real-world yang kompleks
✅ Teknik visualisasi advanced
✅ Comparative analysis antar entitas
✅ Pipeline analisis yang reusable
✅ Best practices dalam reporting

Pertemuan selanjutnya: UTS - Proyek analisis data mandiri