Documentation
LANGUAGE_GUIDE
When to use Python vs JavaScript vs R vs SQL. A tactical analysis for newsroom deployments.
01 // OVERVIEW
QUICK REFERENCE| Objective | Protocol | Rationale |
|---|---|---|
| Data cleaning | PYTHON | pandas library, readable syntax |
| Statistical analysis | R | Built for stats, publication output |
| Web scraping | PYTHON | BeautifulSoup, Playwright |
| Interactive viz | JAVASCRIPT | D3.js, runs in browsers |
| File automation | BASH | System native, fast execution |
| Database queries | SQL | Purpose-built for databases |
02 // PYTHON
GENERAL PURPOSECapabilities
- > Data cleaning and transformation
- > Web scraping
- > API integrations
- > PDF processing
Advantages
- > Readable syntax
- > Huge ecosystem
- > Cross-platform
USE_CASE.py
import pandas as pd
# Load, clean, and analyze campaign finance data
df = pd.read_csv("contributions.csv")
df['date'] = pd.to_datetime(df['date'])
monthly = df.groupby(df['date'].dt.month)['amount'].sum()
print(monthly)
03 // JAVASCRIPT
WEB INTERACTIONCapabilities
- > Interactive visualizations
- > Browser-based tools
- > Maps and charts
Advantages
- > Runs in browser
- > No user installation
- > D3.js ecosystem
VISUALIZE.js
// Interactive bar chart with D3
d3.csv("data.csv").then(data => {
const svg = d3.select("#chart")
.append("svg")
.attr("width", 600)
.attr("height", 400);
// ... bindings and rendering
});
04 // R_LANG
STATISTICSCapabilities
- > Statistical analysis
- > Publication-quality plots
- > Reproducible research
Advantages
- > Built for stats
- > ggplot2 engine
- > Tidyverse syntax
ANALYSIS.r
library(tidyverse)
# Analyze election results with statistical summary
results <- read_csv("election_data.csv")
results %>%
group_by(county) %>%
summarize(
avg_turnout = mean(turnout),
margin = mean(dem_votes - rep_votes)
) %>%
ggplot(aes(x = avg_turnout, y = margin)) +
geom_point() +
geom_smooth(method = "lm")
05 // SQL
DATABASE QUERIESCapabilities
- > Querying databases
- > Filtering and aggregating large datasets
- > Joining tables
- > Data extraction from newsroom databases
Advantages
- > Purpose-built for data queries
- > Extremely fast on large datasets
- > Works with most data systems
- > Skills transfer across databases
TOP_DONORS.sql
-- Find top donors by total contributions
SELECT
donor_name,
COUNT(*) as num_contributions,
SUM(amount) as total_amount
FROM contributions
WHERE date >= '2024-01-01'
GROUP BY donor_name
ORDER BY total_amount DESC
LIMIT 20;
06 // BASH
FILE AUTOMATIONCapabilities
- > Quick file operations
- > Batch renaming
- > Combining command-line tools
- > Scheduled tasks (cron jobs)
Advantages
- > Already on Mac/Linux
- > Extremely fast for simple tasks
- > Pipes let you chain tools
- > Great for automation scripts
FILE_OPS.sh
# Rename all PDFs to include date prefix
for f in *.pdf; do
mv "$f" "2024-12-$(basename "$f")"
done
# Count lines in all CSV files
wc -l *.csv
# Find all files modified today
find . -mtime 0 -type f
05 // LOGIC_TREE
SELECTION ALGORITHMWhat are you doing?
│
├─► Working with data in spreadsheets/CSVs?
│ └─► Python (pandas)
│
├─► Need statistical analysis or p-values?
│ └─► R
│
├─► Building something for a web page?
│ └─► JavaScript
│
├─► Querying a database?
│ └─► SQL
│
├─► Quick file operations?
│ └─► Bash
│
└─► Not sure?
└─► START WITH PYTHON
08 // PIPELINE_INTEGRATION
MIXING LANGUAGESReal projects often combine languages. Each does what it's best at:
1
SQL
to extract data from database
2
Python
to clean and transform
3
R
to run statistical analysis
4
JavaScript
to create interactive visualization
5
Bash
to schedule the pipeline
09 // INIT_SEQUENCE
GETTING STARTEDPython
- INSTALL: python.org or Anaconda
- EDITOR: VS Code + Python extension
- FIRST: Clean a CSV with pandas
JavaScript
- INSTALL: None (runs in browser)
- EDITOR: VS Code
- FIRST: Chart with Chart.js
R
- INSTALL: r-project.org
- EDITOR: RStudio (free)
- FIRST: ggplot2 chart
SQL
- PRACTICE: SQLite or online playgrounds
- FIRST: Query a public dataset
Bash
- INSTALL: Already on Mac/Linux
- WINDOWS: Git Bash or WSL
- FIRST: Batch rename files