Documentation
Language Guide
When to use Python vs JavaScript vs R vs SQL. A tactical analysis for newsroom deployments.
01 // Overview
Quick reference| Objective | Protocol | Rationale |
|---|---|---|
| Data cleaning | Python | pandas library, readable syntax |
| Statistical analysis | R | Built for stats, publication output |
| Web scraping | Python | BeautifulSoup, Playwright |
| Interactive viz | JavaScript | D3.js, runs in browsers |
| File automation | Bash | System native, fast execution |
| Database queries | SQL | Purpose-built for databases |
02 // Python
General purposeCapabilities
- > Data cleaning and transformation
- > Web scraping
- > API integrations
- > PDF processing
Advantages
- > Readable syntax
- > Huge ecosystem
- > Cross-platform
use_case.py
import pandas as pd
# Load, clean, and analyze campaign finance data
df = pd.read_csv("contributions.csv")
df['date'] = pd.to_datetime(df['date'])
monthly = df.groupby(df['date'].dt.month)['amount'].sum()
print(monthly)
03 // JavaScript
Web interactionCapabilities
- > Interactive visualizations
- > Browser-based tools
- > Maps and charts
Advantages
- > Runs in browser
- > No user installation
- > D3.js ecosystem
visualize.js
// Interactive bar chart with D3
d3.csv("data.csv").then(data => {
const svg = d3.select("#chart")
.append("svg")
.attr("width", 600)
.attr("height", 400);
// ... bindings and rendering
});
04 // R Lang
StatisticsCapabilities
- > Statistical analysis
- > Publication-quality plots
- > Reproducible research
Advantages
- > Built for stats
- > ggplot2 engine
- > Tidyverse syntax
analysis.r
library(tidyverse)
# Analyze election results with statistical summary
results <- read_csv("election_data.csv")
results %>%
group_by(county) %>%
summarize(
avg_turnout = mean(turnout),
margin = mean(dem_votes - rep_votes)
) %>%
ggplot(aes(x = avg_turnout, y = margin)) +
geom_point() +
geom_smooth(method = "lm")
05 // SQL
Database queriesCapabilities
- > Querying databases
- > Filtering and aggregating large datasets
- > Joining tables
- > Data extraction from newsroom databases
Advantages
- > Purpose-built for data queries
- > Extremely fast on large datasets
- > Works with most data systems
- > Skills transfer across databases
top_donors.sql
-- Find top donors by total contributions
SELECT
donor_name,
COUNT(*) as num_contributions,
SUM(amount) as total_amount
FROM contributions
WHERE date >= '2024-01-01'
GROUP BY donor_name
ORDER BY total_amount DESC
LIMIT 20;
06 // Bash
File automationCapabilities
- > Quick file operations
- > Batch renaming
- > Combining command-line tools
- > Scheduled tasks (cron jobs)
Advantages
- > Already on Mac/Linux
- > Extremely fast for simple tasks
- > Pipes let you chain tools
- > Great for automation scripts
file_ops.sh
# Rename all PDFs to include date prefix
for f in *.pdf; do
mv "$f" "2024-12-$(basename "$f")"
done
# Count lines in all CSV files
wc -l *.csv
# Find all files modified today
find . -mtime 0 -type f
07 // Logic Tree
Selection algorithmWhat are you doing?
|
+-> Working with data in spreadsheets/CSVs?
| +-> Python (pandas)
|
+-> Need statistical analysis or p-values?
| +-> R
|
+-> Building something for a web page?
| +-> JavaScript
|
+-> Querying a database?
| +-> SQL
|
+-> Quick file operations?
| +-> Bash
|
+-> Not sure?
+-> Start with Python
08 // Pipeline Integration
Mixing languagesReal projects often combine languages. Each does what it's best at:
1
SQL
to extract data from database
2
Python
to clean and transform
3
R
to run statistical analysis
4
JavaScript
to create interactive visualization
5
Bash
to schedule the pipeline
09 // Init Sequence
Getting startedPython
- Install: python.org or Anaconda
- Editor: VS Code + Python extension
- First: Clean a CSV with pandas
JavaScript
- Install: None (runs in browser)
- Editor: VS Code
- First: Chart with Chart.js
R
- Install: r-project.org
- Editor: RStudio (free)
- First: ggplot2 chart
SQL
- Practice: SQLite or online playgrounds
- First: Query a public dataset
Bash
- Install: Already on Mac/Linux
- Windows: Git Bash or WSL
- First: Batch rename files