๐Ÿ“˜ Pertemuan 8 - Ujian Tengah Semester (UTS): Analisis Data Mandiri

๐Ÿ“ INFORMASI UJIAN

  • Bentuk Ujian: Proyek Analisis Data Individual
  • Durasi: 2 minggu (take-home)
  • Tools: Google Colab (wajib)
  • Output: Notebook .ipynb + Laporan PDF
  • Bobot Nilai: 30% dari nilai akhir
  • Pengumpulan: Via Google Drive/LMS

๐ŸŽฏ Tujuan UTS

  1. Menguji kemampuan mahasiswa dalam menerapkan konsep data sains secara mandiri
  2. Menilai pemahaman terhadap seluruh materi pertemuan 1-7
  3. Mengukur kemampuan problem solving dengan data nyata
  4. Mengevaluasi skill dalam membuat visualisasi dan interpretasi
  5. Menilai kemampuan komunikasi hasil analisis

๐Ÿ“š KISI-KISI UTS

A. Cakupan Materi (100%)

Pertemuan Topik Bobot Kompetensi yang Diuji
1 Pengenalan Data Sains & Google Colab 10% Setup environment, import libraries, basic operations
2 Dasar Python untuk Data Sains 15% Data types, loops, functions, list comprehension
3 Manipulasi Data dengan Pandas 20% DataFrame operations, filtering, groupby, merge
4 Exploratory Data Analysis 15% Statistics, distributions, outlier detection
5 Visualisasi Data 20% Multiple chart types, storytelling with data
6 Data Cleaning & Transformation 15% Missing values, duplicates, normalization
7 Studi Kasus Mini 5% End-to-end analysis workflow

B. Komponen Penilaian

Komponen Bobot Kriteria Penilaian
Data Collection & Import 10% โ€ข Dataset min 500 rows
โ€ข Sumber data jelas
โ€ข Import successful
Data Cleaning 20% โ€ข Handle missing values
โ€ข Remove duplicates
โ€ข Fix data types
โ€ข Document decisions
Exploratory Analysis 25% โ€ข Descriptive statistics
โ€ข Distribution analysis
โ€ข Correlation analysis
โ€ข Pattern identification
Visualization 25% โ€ข Min 5 chart types
โ€ข Clear labeling
โ€ข Appropriate chart selection
โ€ข Visual storytelling
Insights & Interpretation 15% โ€ข Meaningful insights
โ€ข Business context
โ€ข Actionable recommendations
Code Quality & Documentation 5% โ€ข Clean code
โ€ข Comments
โ€ข Markdown cells
โ€ข Logical flow

๐Ÿ“‹ PANDUAN PENGERJAAN

Step-by-Step Guide

  1. Hari 1-2: Pemilihan Dataset
    • Browse Kaggle, UCI ML Repository, data.go.id
    • Pilih topik yang menarik
    • Validasi ukuran dan kelengkapan data
  2. Hari 3-4: Data Preparation
    • Import dan explore data
    • Identify data quality issues
    • Create cleaning pipeline
  3. Hari 5-7: Analysis & Visualization
    • Statistical analysis
    • Create visualizations
    • Identify patterns
  4. Hari 8-9: Insights & Reporting
    • Summarize findings
    • Write interpretations
    • Create recommendations
  5. Hari 10: Review & Submit
    • Code cleanup
    • Final documentation
    • Export to PDF

๐Ÿ’ป TEMPLATE NOTEBOOK STRUKTUR

# UTS - PENGANTAR DATA SAINS
# Nama: [Nama Lengkap]
# NIM: [NIM]
# Judul Analisis: [Judul Proyek]

"""
## ๐Ÿ“Š ANALISIS [JUDUL DATASET]
### Executive Summary
[Ringkasan 3-5 kalimat tentang proyek dan temuan utama]
"""

# ========================================
# 1. SETUP & IMPORT LIBRARIES
# ========================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Setup visualization style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
print("โœ… Libraries imported successfully")

# ========================================
# 2. DATA COLLECTION
# ========================================
"""
### ๐Ÿ“ฅ Dataset Information
- **Source**: [URL atau nama sumber]
- **Description**: [Deskripsi dataset]
- **Size**: [Jumlah rows x columns]
- **Period**: [Jika ada timeframe]
"""

# Load data
df = pd.read_csv('your_dataset.csv')
print(f"Dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

# Display first few rows
df.head()

# ========================================
# 3. DATA EXPLORATION
# ========================================
"""
### ๐Ÿ” Initial Data Exploration
"""

# Data types
print("Data Types:")
print(df.dtypes)

# Basic statistics
print("\nStatistical Summary:")
df.describe()

# Missing values
print("\nMissing Values:")
df.isnull().sum()

# Unique values
for col in df.select_dtypes(include='object').columns:
    print(f"\n{col}: {df[col].nunique()} unique values")
    if df[col].nunique() < 10:
        print(df[col].value_counts())

# ========================================
# 4. DATA CLEANING
# ========================================
"""
### ๐Ÿงน Data Cleaning Process
"""

# Document original shape
original_shape = df.shape
print(f"Original shape: {original_shape}")

# 1. Handle missing values
# [Strategy explanation]
df_clean = df.copy()

# Example: Fill numeric with median
numeric_cols = df_clean.select_dtypes(include=[np.number]).columns
for col in numeric_cols:
    if df_clean[col].isnull().sum() > 0:
        df_clean[col].fillna(df_clean[col].median(), inplace=True)
        print(f"Filled {col} with median")

# 2. Remove duplicates
duplicates = df_clean.duplicated().sum()
if duplicates > 0:
    df_clean = df_clean.drop_duplicates()
    print(f"Removed {duplicates} duplicate rows")

# 3. Fix data types
# [Convert as needed]

# Final shape
print(f"\nFinal shape after cleaning: {df_clean.shape}")
print(f"Rows removed: {original_shape[0] - df_clean.shape[0]}")

# ========================================
# 5. FEATURE ENGINEERING
# ========================================
"""
### ๐Ÿ”ง Feature Engineering
"""

# Create new meaningful features
# Example:
# df_clean['new_feature'] = df_clean['col1'] / df_clean['col2']

# ========================================
# 6. STATISTICAL ANALYSIS
# ========================================
"""
### ๐Ÿ“Š Statistical Analysis
"""

# Correlation matrix
correlation_matrix = df_clean.select_dtypes(include=[np.number]).corr()

# Distribution analysis
# [Analyze key variables]

# Outlier detection
Q1 = df_clean.quantile(0.25)
Q3 = df_clean.quantile(0.75)
IQR = Q3 - Q1

# ========================================
# 7. DATA VISUALIZATION
# ========================================
"""
### ๐Ÿ“ˆ Data Visualization
"""

# Create figure with subplots
fig = plt.figure(figsize=(20, 15))
fig.suptitle('Data Analysis Dashboard', fontsize=20, fontweight='bold')

# Visualization 1: Distribution
ax1 = plt.subplot(3, 3, 1)
# [Your code here]
ax1.set_title('Distribution of [Variable]')

# Visualization 2: Correlation Heatmap
ax2 = plt.subplot(3, 3, 2)
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
ax2.set_title('Correlation Matrix')

# Visualization 3: Time Series (if applicable)
ax3 = plt.subplot(3, 3, 3)
# [Your code here]

# Visualization 4: Bar Chart
ax4 = plt.subplot(3, 3, 4)
# [Your code here]

# Visualization 5: Scatter Plot
ax5 = plt.subplot(3, 3, 5)
# [Your code here]

# Add more visualizations...

plt.tight_layout()
plt.show()

# ========================================
# 8. KEY INSIGHTS
# ========================================
"""
### ๐Ÿ’ก Key Insights

1. **Finding 1**: [Description and interpretation]
2. **Finding 2**: [Description and interpretation]
3. **Finding 3**: [Description and interpretation]
4. **Finding 4**: [Description and interpretation]
5. **Finding 5**: [Description and interpretation]
"""

# ========================================
# 9. CONCLUSIONS & RECOMMENDATIONS
# ========================================
"""
### ๐Ÿ“Œ Conclusions

[Summary of main findings]

### ๐ŸŽฏ Recommendations

1. **Business Recommendation 1**: [Actionable insight]
2. **Business Recommendation 2**: [Actionable insight]
3. **Business Recommendation 3**: [Actionable insight]

### ๐Ÿ”ฎ Future Work

- [Suggestion for further analysis]
- [Additional data that could be useful]
- [Advanced techniques to apply]
"""

# ========================================
# 10. REFERENCES
# ========================================
"""
### ๐Ÿ“š References

- Dataset Source: [URL]
- Documentation: [Any references used]
- Additional Resources: [If any]
"""

๐Ÿ“ CONTOH SOAL UTS

Contoh Kasus 1: E-Commerce Customer Analysis

Dataset: Online Retail Dataset dari UCI

Tasks:

  1. Load dan explore dataset transaksi online retail
  2. Clean data (handle negatives, missing values)
  3. Perform RFM (Recency, Frequency, Monetary) analysis
  4. Segment customers into groups
  5. Visualize customer distribution dan spending patterns
  6. Identify top products dan seasonal trends
  7. Provide recommendations untuk marketing strategy

Sample Solution Code:

# RFM Analysis Example
import datetime as dt

# Calculate RFM metrics
snapshot_date = df['InvoiceDate'].max() + dt.timedelta(days=1)

rfm = df.groupby(['CustomerID']).agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
    'InvoiceNo': 'count',
    'TotalAmount': 'sum'
})

rfm.rename(columns={
    'InvoiceDate': 'Recency',
    'InvoiceNo': 'Frequency',
    'TotalAmount': 'Monetary'
}, inplace=True)

# Create RFM segments
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1])
rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5])
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5])

rfm['RFM_Segment'] = rfm['R_Score'].astype(str) + rfm['F_Score'].astype(str) + rfm['M_Score'].astype(str)

# Customer segmentation
def segment_customers(row):
    if row['RFM_Segment'] == '555':
        return 'Champions'
    elif row['RFM_Segment'] == '554' or row['RFM_Segment'] == '544':
        return 'Loyal Customers'
    elif row['RFM_Segment'] == '553' or row['RFM_Segment'] == '551':
        return 'Potential Loyalists'
    elif row['RFM_Segment'] == '522' or row['RFM_Segment'] == '521':
        return 'New Customers'
    elif row['RFM_Segment'][0] in ['4','5'] and row['RFM_Segment'][1] in ['1','2']:
        return 'At Risk'
    elif row['RFM_Segment'][0] in ['1','2'] and row['RFM_Segment'][1] in ['1','2']:
        return 'Lost'
    else:
        return 'Others'

rfm['Segment'] = rfm.apply(segment_customers, axis=1)

Contoh Kasus 2: Air Quality Analysis

Dataset: Air Quality Dataset dari data.jakarta.go.id

Tasks:

  1. Import data kualitas udara Jakarta
  2. Handle missing measurements
  3. Analyze pollutant levels (PM2.5, PM10, CO, NO2)
  4. Identify patterns: daily, weekly, seasonal
  5. Compare pollution levels across districts
  6. Correlate with weather data if available
  7. Create health risk assessment visualization

Contoh Kasus 3: Student Performance Analysis

Dataset: Students Performance Dataset dari Kaggle

Tasks:

  1. Load student academic performance data
  2. Clean and standardize scores
  3. Analyze factors affecting performance
  4. Compare performance across demographics
  5. Identify at-risk students
  6. Visualize grade distributions
  7. Recommend intervention strategies

๐ŸŽฏ TIPS SUKSES UTS

Do's โœ…

  • Pilih dataset yang Anda pahami konteksnya
  • Start early - jangan tunggu deadline
  • Document every decision you make
  • Use clear variable names
  • Create meaningful visualizations
  • Focus on insights, not just code
  • Test your code multiple times
  • Ask questions if unclear

Don'ts โŒ

  • Don't use dataset < 500 rows
  • Don't skip data cleaning
  • Don't create charts without labels
  • Don't copy code without understanding
  • Don't submit without reviewing
  • Don't ignore missing values
  • Don't forget to interpret results
  • Don't plagiarize

๐Ÿ“Š RUBRIK PENILAIAN DETAIL

Grade Range Kriteria
A 85-100 โ€ข Exceptional analysis depth
โ€ข Creative visualizations
โ€ข Strong business insights
โ€ข Clean, well-documented code
โ€ข Goes beyond requirements
B 70-84 โ€ข Complete all requirements
โ€ข Good visualizations
โ€ข Clear interpretations
โ€ข Minor issues in code
โ€ข Solid understanding shown
C 55-69 โ€ข Most requirements met
โ€ข Basic visualizations
โ€ข Some interpretations missing
โ€ข Several code issues
โ€ข Understanding needs improvement
D 40-54 โ€ข Incomplete analysis
โ€ข Poor visualizations
โ€ข Weak interpretations
โ€ข Major code errors
โ€ข Minimal effort shown
E <40 โ€ข Missing major components
โ€ข No clear analysis
โ€ข Code doesn't run
โ€ข No interpretations
โ€ข Plagiarism detected

๐Ÿ“… TIMELINE & DEADLINES

Tanggal Kegiatan Output
Senin Minggu 1 Pembagian soal UTS Receive instructions
Selasa-Rabu Dataset selection & exploration Dataset chosen
Kamis-Jumat Data cleaning & preparation Clean dataset
Senin Minggu 2 Analysis & visualization Charts created
Selasa-Rabu Insights & interpretation Findings documented
Kamis Final review & formatting Complete notebook
Jumat 23:59 DEADLINE SUBMISSION .ipynb + PDF

๐Ÿ†˜ FAQ & TROUBLESHOOTING

Frequently Asked Questions

Q: Boleh menggunakan dataset dari mana saja?
A: Ya, asalkan dari sumber terpercaya dan minimal 500 rows. Recommended: Kaggle, UCI, data.go.id

Q: Apakah boleh bekerjasama?
A: Diskusi konsep diperbolehkan, tapi code dan analisis harus individual. Plagiarism = nilai E

Q: Bagaimana jika dataset memiliki banyak missing values?
A: Document your cleaning strategy. Jelaskan mengapa memilih metode tertentu

Q: Minimal berapa visualisasi?
A: Minimal 5 jenis chart berbeda yang meaningful, bukan sekedar banyak

Q: Apakah perlu machine learning?
A: Tidak wajib untuk UTS. Fokus pada EDA dan insights

Q: Format pengumpulan?
A: File .ipynb (wajib) + export ke PDF. Nama file: UTS_NIM_Nama.ipynb

๐Ÿ† CONTOH PROJECT TERBAIK

Karakteristik Project Grade A:

  • Dataset Choice: Relevant dan interesting topic
  • Data Cleaning: Thorough dengan justifikasi clear
  • Analysis Depth: Multiple perspectives explored
  • Visualizations: Creative, informative, well-labeled
  • Insights: Deep, actionable, business-relevant
  • Code Quality: Clean, efficient, well-commented
  • Presentation: Professional, easy to follow
  • Extra Mile: Additional analysis beyond requirements

โœ๏ธ Kesimpulan & Persiapan

Checklist Persiapan UTS

  • โ˜ Review semua materi pertemuan 1-7
  • โ˜ Pastikan bisa menggunakan Pandas, Matplotlib, Seaborn
  • โ˜ Latihan dengan minimal 2 dataset berbeda
  • โ˜ Siapkan template notebook
  • โ˜ Bookmark dokumentasi Python/Pandas
  • โ˜ Test Google Colab environment
  • โ˜ Backup code regularly

Remember: UTS bukan hanya tentang coding, tapi kemampuan menganalisis, menginterpretasi, dan mengkomunikasikan temuan dari data. Quality over quantity!

๐ŸŽฏ Good Luck! You've got this! ๐Ÿš€