๐ Pertemuan 8 - Ujian Tengah Semester (UTS): Analisis Data Mandiri
๐ INFORMASI UJIAN
- Bentuk Ujian: Proyek Analisis Data Individual
- Durasi: 2 minggu (take-home)
- Tools: Google Colab (wajib)
- Output: Notebook .ipynb + Laporan PDF
- Bobot Nilai: 30% dari nilai akhir
- Pengumpulan: Via Google Drive/LMS
๐ฏ Tujuan UTS
- Menguji kemampuan mahasiswa dalam menerapkan konsep data sains secara mandiri
- Menilai pemahaman terhadap seluruh materi pertemuan 1-7
- Mengukur kemampuan problem solving dengan data nyata
- Mengevaluasi skill dalam membuat visualisasi dan interpretasi
- Menilai kemampuan komunikasi hasil analisis
๐ KISI-KISI UTS
A. Cakupan Materi (100%)
| Pertemuan | Topik | Bobot | Kompetensi yang Diuji |
|---|---|---|---|
| 1 | Pengenalan Data Sains & Google Colab | 10% | Setup environment, import libraries, basic operations |
| 2 | Dasar Python untuk Data Sains | 15% | Data types, loops, functions, list comprehension |
| 3 | Manipulasi Data dengan Pandas | 20% | DataFrame operations, filtering, groupby, merge |
| 4 | Exploratory Data Analysis | 15% | Statistics, distributions, outlier detection |
| 5 | Visualisasi Data | 20% | Multiple chart types, storytelling with data |
| 6 | Data Cleaning & Transformation | 15% | Missing values, duplicates, normalization |
| 7 | Studi Kasus Mini | 5% | End-to-end analysis workflow |
B. Komponen Penilaian
| Komponen | Bobot | Kriteria Penilaian |
|---|---|---|
| Data Collection & Import | 10% |
โข Dataset min 500 rows โข Sumber data jelas โข Import successful |
| Data Cleaning | 20% |
โข Handle missing values โข Remove duplicates โข Fix data types โข Document decisions |
| Exploratory Analysis | 25% |
โข Descriptive statistics โข Distribution analysis โข Correlation analysis โข Pattern identification |
| Visualization | 25% |
โข Min 5 chart types โข Clear labeling โข Appropriate chart selection โข Visual storytelling |
| Insights & Interpretation | 15% |
โข Meaningful insights โข Business context โข Actionable recommendations |
| Code Quality & Documentation | 5% |
โข Clean code โข Comments โข Markdown cells โข Logical flow |
๐ PANDUAN PENGERJAAN
Step-by-Step Guide
- Hari 1-2: Pemilihan Dataset
- Browse Kaggle, UCI ML Repository, data.go.id
- Pilih topik yang menarik
- Validasi ukuran dan kelengkapan data
- Hari 3-4: Data Preparation
- Import dan explore data
- Identify data quality issues
- Create cleaning pipeline
- Hari 5-7: Analysis & Visualization
- Statistical analysis
- Create visualizations
- Identify patterns
- Hari 8-9: Insights & Reporting
- Summarize findings
- Write interpretations
- Create recommendations
- Hari 10: Review & Submit
- Code cleanup
- Final documentation
- Export to PDF
๐ป TEMPLATE NOTEBOOK STRUKTUR
# UTS - PENGANTAR DATA SAINS
# Nama: [Nama Lengkap]
# NIM: [NIM]
# Judul Analisis: [Judul Proyek]
"""
## ๐ ANALISIS [JUDUL DATASET]
### Executive Summary
[Ringkasan 3-5 kalimat tentang proyek dan temuan utama]
"""
# ========================================
# 1. SETUP & IMPORT LIBRARIES
# ========================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Setup visualization style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
print("โ
Libraries imported successfully")
# ========================================
# 2. DATA COLLECTION
# ========================================
"""
### ๐ฅ Dataset Information
- **Source**: [URL atau nama sumber]
- **Description**: [Deskripsi dataset]
- **Size**: [Jumlah rows x columns]
- **Period**: [Jika ada timeframe]
"""
# Load data
df = pd.read_csv('your_dataset.csv')
print(f"Dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
# Display first few rows
df.head()
# ========================================
# 3. DATA EXPLORATION
# ========================================
"""
### ๐ Initial Data Exploration
"""
# Data types
print("Data Types:")
print(df.dtypes)
# Basic statistics
print("\nStatistical Summary:")
df.describe()
# Missing values
print("\nMissing Values:")
df.isnull().sum()
# Unique values
for col in df.select_dtypes(include='object').columns:
print(f"\n{col}: {df[col].nunique()} unique values")
if df[col].nunique() < 10:
print(df[col].value_counts())
# ========================================
# 4. DATA CLEANING
# ========================================
"""
### ๐งน Data Cleaning Process
"""
# Document original shape
original_shape = df.shape
print(f"Original shape: {original_shape}")
# 1. Handle missing values
# [Strategy explanation]
df_clean = df.copy()
# Example: Fill numeric with median
numeric_cols = df_clean.select_dtypes(include=[np.number]).columns
for col in numeric_cols:
if df_clean[col].isnull().sum() > 0:
df_clean[col].fillna(df_clean[col].median(), inplace=True)
print(f"Filled {col} with median")
# 2. Remove duplicates
duplicates = df_clean.duplicated().sum()
if duplicates > 0:
df_clean = df_clean.drop_duplicates()
print(f"Removed {duplicates} duplicate rows")
# 3. Fix data types
# [Convert as needed]
# Final shape
print(f"\nFinal shape after cleaning: {df_clean.shape}")
print(f"Rows removed: {original_shape[0] - df_clean.shape[0]}")
# ========================================
# 5. FEATURE ENGINEERING
# ========================================
"""
### ๐ง Feature Engineering
"""
# Create new meaningful features
# Example:
# df_clean['new_feature'] = df_clean['col1'] / df_clean['col2']
# ========================================
# 6. STATISTICAL ANALYSIS
# ========================================
"""
### ๐ Statistical Analysis
"""
# Correlation matrix
correlation_matrix = df_clean.select_dtypes(include=[np.number]).corr()
# Distribution analysis
# [Analyze key variables]
# Outlier detection
Q1 = df_clean.quantile(0.25)
Q3 = df_clean.quantile(0.75)
IQR = Q3 - Q1
# ========================================
# 7. DATA VISUALIZATION
# ========================================
"""
### ๐ Data Visualization
"""
# Create figure with subplots
fig = plt.figure(figsize=(20, 15))
fig.suptitle('Data Analysis Dashboard', fontsize=20, fontweight='bold')
# Visualization 1: Distribution
ax1 = plt.subplot(3, 3, 1)
# [Your code here]
ax1.set_title('Distribution of [Variable]')
# Visualization 2: Correlation Heatmap
ax2 = plt.subplot(3, 3, 2)
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
ax2.set_title('Correlation Matrix')
# Visualization 3: Time Series (if applicable)
ax3 = plt.subplot(3, 3, 3)
# [Your code here]
# Visualization 4: Bar Chart
ax4 = plt.subplot(3, 3, 4)
# [Your code here]
# Visualization 5: Scatter Plot
ax5 = plt.subplot(3, 3, 5)
# [Your code here]
# Add more visualizations...
plt.tight_layout()
plt.show()
# ========================================
# 8. KEY INSIGHTS
# ========================================
"""
### ๐ก Key Insights
1. **Finding 1**: [Description and interpretation]
2. **Finding 2**: [Description and interpretation]
3. **Finding 3**: [Description and interpretation]
4. **Finding 4**: [Description and interpretation]
5. **Finding 5**: [Description and interpretation]
"""
# ========================================
# 9. CONCLUSIONS & RECOMMENDATIONS
# ========================================
"""
### ๐ Conclusions
[Summary of main findings]
### ๐ฏ Recommendations
1. **Business Recommendation 1**: [Actionable insight]
2. **Business Recommendation 2**: [Actionable insight]
3. **Business Recommendation 3**: [Actionable insight]
### ๐ฎ Future Work
- [Suggestion for further analysis]
- [Additional data that could be useful]
- [Advanced techniques to apply]
"""
# ========================================
# 10. REFERENCES
# ========================================
"""
### ๐ References
- Dataset Source: [URL]
- Documentation: [Any references used]
- Additional Resources: [If any]
"""
๐ CONTOH SOAL UTS
Contoh Kasus 1: E-Commerce Customer Analysis
Dataset: Online Retail Dataset dari UCI
Tasks:
- Load dan explore dataset transaksi online retail
- Clean data (handle negatives, missing values)
- Perform RFM (Recency, Frequency, Monetary) analysis
- Segment customers into groups
- Visualize customer distribution dan spending patterns
- Identify top products dan seasonal trends
- Provide recommendations untuk marketing strategy
Sample Solution Code:
# RFM Analysis Example
import datetime as dt
# Calculate RFM metrics
snapshot_date = df['InvoiceDate'].max() + dt.timedelta(days=1)
rfm = df.groupby(['CustomerID']).agg({
'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
'InvoiceNo': 'count',
'TotalAmount': 'sum'
})
rfm.rename(columns={
'InvoiceDate': 'Recency',
'InvoiceNo': 'Frequency',
'TotalAmount': 'Monetary'
}, inplace=True)
# Create RFM segments
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1])
rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5])
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5])
rfm['RFM_Segment'] = rfm['R_Score'].astype(str) + rfm['F_Score'].astype(str) + rfm['M_Score'].astype(str)
# Customer segmentation
def segment_customers(row):
if row['RFM_Segment'] == '555':
return 'Champions'
elif row['RFM_Segment'] == '554' or row['RFM_Segment'] == '544':
return 'Loyal Customers'
elif row['RFM_Segment'] == '553' or row['RFM_Segment'] == '551':
return 'Potential Loyalists'
elif row['RFM_Segment'] == '522' or row['RFM_Segment'] == '521':
return 'New Customers'
elif row['RFM_Segment'][0] in ['4','5'] and row['RFM_Segment'][1] in ['1','2']:
return 'At Risk'
elif row['RFM_Segment'][0] in ['1','2'] and row['RFM_Segment'][1] in ['1','2']:
return 'Lost'
else:
return 'Others'
rfm['Segment'] = rfm.apply(segment_customers, axis=1)
Contoh Kasus 2: Air Quality Analysis
Dataset: Air Quality Dataset dari data.jakarta.go.id
Tasks:
- Import data kualitas udara Jakarta
- Handle missing measurements
- Analyze pollutant levels (PM2.5, PM10, CO, NO2)
- Identify patterns: daily, weekly, seasonal
- Compare pollution levels across districts
- Correlate with weather data if available
- Create health risk assessment visualization
Contoh Kasus 3: Student Performance Analysis
Dataset: Students Performance Dataset dari Kaggle
Tasks:
- Load student academic performance data
- Clean and standardize scores
- Analyze factors affecting performance
- Compare performance across demographics
- Identify at-risk students
- Visualize grade distributions
- Recommend intervention strategies
๐ฏ TIPS SUKSES UTS
Do's โ
- Pilih dataset yang Anda pahami konteksnya
- Start early - jangan tunggu deadline
- Document every decision you make
- Use clear variable names
- Create meaningful visualizations
- Focus on insights, not just code
- Test your code multiple times
- Ask questions if unclear
Don'ts โ
- Don't use dataset < 500 rows
- Don't skip data cleaning
- Don't create charts without labels
- Don't copy code without understanding
- Don't submit without reviewing
- Don't ignore missing values
- Don't forget to interpret results
- Don't plagiarize
๐ RUBRIK PENILAIAN DETAIL
| Grade | Range | Kriteria |
|---|---|---|
| A | 85-100 |
โข Exceptional analysis depth โข Creative visualizations โข Strong business insights โข Clean, well-documented code โข Goes beyond requirements |
| B | 70-84 |
โข Complete all requirements โข Good visualizations โข Clear interpretations โข Minor issues in code โข Solid understanding shown |
| C | 55-69 |
โข Most requirements met โข Basic visualizations โข Some interpretations missing โข Several code issues โข Understanding needs improvement |
| D | 40-54 |
โข Incomplete analysis โข Poor visualizations โข Weak interpretations โข Major code errors โข Minimal effort shown |
| E | <40 |
โข Missing major components โข No clear analysis โข Code doesn't run โข No interpretations โข Plagiarism detected |
๐ TIMELINE & DEADLINES
| Tanggal | Kegiatan | Output |
|---|---|---|
| Senin Minggu 1 | Pembagian soal UTS | Receive instructions |
| Selasa-Rabu | Dataset selection & exploration | Dataset chosen |
| Kamis-Jumat | Data cleaning & preparation | Clean dataset |
| Senin Minggu 2 | Analysis & visualization | Charts created |
| Selasa-Rabu | Insights & interpretation | Findings documented |
| Kamis | Final review & formatting | Complete notebook |
| Jumat 23:59 | DEADLINE SUBMISSION | .ipynb + PDF |
๐ FAQ & TROUBLESHOOTING
Frequently Asked Questions
Q: Boleh menggunakan dataset dari mana saja?
A: Ya, asalkan dari sumber terpercaya dan minimal 500 rows. Recommended: Kaggle, UCI, data.go.id
Q: Apakah boleh bekerjasama?
A: Diskusi konsep diperbolehkan, tapi code dan analisis harus individual. Plagiarism = nilai E
Q: Bagaimana jika dataset memiliki banyak missing values?
A: Document your cleaning strategy. Jelaskan mengapa memilih metode tertentu
Q: Minimal berapa visualisasi?
A: Minimal 5 jenis chart berbeda yang meaningful, bukan sekedar banyak
Q: Apakah perlu machine learning?
A: Tidak wajib untuk UTS. Fokus pada EDA dan insights
Q: Format pengumpulan?
A: File .ipynb (wajib) + export ke PDF. Nama file: UTS_NIM_Nama.ipynb
๐ CONTOH PROJECT TERBAIK
Karakteristik Project Grade A:
- Dataset Choice: Relevant dan interesting topic
- Data Cleaning: Thorough dengan justifikasi clear
- Analysis Depth: Multiple perspectives explored
- Visualizations: Creative, informative, well-labeled
- Insights: Deep, actionable, business-relevant
- Code Quality: Clean, efficient, well-commented
- Presentation: Professional, easy to follow
- Extra Mile: Additional analysis beyond requirements
โ๏ธ Kesimpulan & Persiapan
Checklist Persiapan UTS
- โ Review semua materi pertemuan 1-7
- โ Pastikan bisa menggunakan Pandas, Matplotlib, Seaborn
- โ Latihan dengan minimal 2 dataset berbeda
- โ Siapkan template notebook
- โ Bookmark dokumentasi Python/Pandas
- โ Test Google Colab environment
- โ Backup code regularly
Remember: UTS bukan hanya tentang coding, tapi kemampuan menganalisis, menginterpretasi, dan mengkomunikasikan temuan dari data. Quality over quantity!
๐ฏ Good Luck! You've got this! ๐