Documentation

LANGUAGE_GUIDE

When to use Python vs JavaScript vs R vs SQL. A tactical analysis for newsroom deployments.

01 // OVERVIEW

QUICK REFERENCE
Objective Protocol Rationale
Data cleaning PYTHON pandas library, readable syntax
Statistical analysis R Built for stats, publication output
Web scraping PYTHON BeautifulSoup, Playwright
Interactive viz JAVASCRIPT D3.js, runs in browsers
File automation BASH System native, fast execution
Database queries SQL Purpose-built for databases

02 // PYTHON

GENERAL PURPOSE

Capabilities

  • > Data cleaning and transformation
  • > Web scraping
  • > API integrations
  • > PDF processing

Advantages

  • > Readable syntax
  • > Huge ecosystem
  • > Cross-platform
USE_CASE.py
import pandas as pd

# Load, clean, and analyze campaign finance data
df = pd.read_csv("contributions.csv")
df['date'] = pd.to_datetime(df['date'])
monthly = df.groupby(df['date'].dt.month)['amount'].sum()
print(monthly)

03 // JAVASCRIPT

WEB INTERACTION

Capabilities

  • > Interactive visualizations
  • > Browser-based tools
  • > Maps and charts

Advantages

  • > Runs in browser
  • > No user installation
  • > D3.js ecosystem
VISUALIZE.js
// Interactive bar chart with D3
d3.csv("data.csv").then(data => {
  const svg = d3.select("#chart")
    .append("svg")
    .attr("width", 600)
    .attr("height", 400);

  // ... bindings and rendering
});

04 // R_LANG

STATISTICS

Capabilities

  • > Statistical analysis
  • > Publication-quality plots
  • > Reproducible research

Advantages

  • > Built for stats
  • > ggplot2 engine
  • > Tidyverse syntax
ANALYSIS.r
library(tidyverse)

# Analyze election results with statistical summary
results <- read_csv("election_data.csv")
results %>%
  group_by(county) %>%
  summarize(
    avg_turnout = mean(turnout),
    margin = mean(dem_votes - rep_votes)
  ) %>%
  ggplot(aes(x = avg_turnout, y = margin)) +
  geom_point() +
  geom_smooth(method = "lm")

05 // SQL

DATABASE QUERIES

Capabilities

  • > Querying databases
  • > Filtering and aggregating large datasets
  • > Joining tables
  • > Data extraction from newsroom databases

Advantages

  • > Purpose-built for data queries
  • > Extremely fast on large datasets
  • > Works with most data systems
  • > Skills transfer across databases
TOP_DONORS.sql
-- Find top donors by total contributions
SELECT
    donor_name,
    COUNT(*) as num_contributions,
    SUM(amount) as total_amount
FROM contributions
WHERE date >= '2024-01-01'
GROUP BY donor_name
ORDER BY total_amount DESC
LIMIT 20;

06 // BASH

FILE AUTOMATION

Capabilities

  • > Quick file operations
  • > Batch renaming
  • > Combining command-line tools
  • > Scheduled tasks (cron jobs)

Advantages

  • > Already on Mac/Linux
  • > Extremely fast for simple tasks
  • > Pipes let you chain tools
  • > Great for automation scripts
FILE_OPS.sh
# Rename all PDFs to include date prefix
for f in *.pdf; do
  mv "$f" "2024-12-$(basename "$f")"
done

# Count lines in all CSV files
wc -l *.csv

# Find all files modified today
find . -mtime 0 -type f

05 // LOGIC_TREE

SELECTION ALGORITHM
What are you doing? │ ├─► Working with data in spreadsheets/CSVs? │ └─► Python (pandas) │ ├─► Need statistical analysis or p-values? │ └─► R │ ├─► Building something for a web page? │ └─► JavaScript │ ├─► Querying a database? │ └─► SQL │ ├─► Quick file operations? │ └─► Bash │ └─► Not sure? └─► START WITH PYTHON

08 // PIPELINE_INTEGRATION

MIXING LANGUAGES

Real projects often combine languages. Each does what it's best at:

1
SQL to extract data from database
2
Python to clean and transform
3
R to run statistical analysis
4
JavaScript to create interactive visualization
5
Bash to schedule the pipeline

09 // INIT_SEQUENCE

GETTING STARTED

Python

  • INSTALL: python.org or Anaconda
  • EDITOR: VS Code + Python extension
  • FIRST: Clean a CSV with pandas

JavaScript

  • INSTALL: None (runs in browser)
  • EDITOR: VS Code
  • FIRST: Chart with Chart.js

R

  • INSTALL: r-project.org
  • EDITOR: RStudio (free)
  • FIRST: ggplot2 chart

SQL

  • PRACTICE: SQLite or online playgrounds
  • FIRST: Query a public dataset

Bash

  • INSTALL: Already on Mac/Linux
  • WINDOWS: Git Bash or WSL
  • FIRST: Batch rename files