Excel, SQL, and Pandas: One-to-One Cheat Sheet for Everyday Data Tasks

A compact, side-by-side reference that maps common data tasks across ExcelSQL, and Python (Pandas). Copy/paste the snippets and adapt table/column names to your data.

A lot of us think in Excel, work with data stored in SQL databases, and script repeatable workflows in Python (Pandas). This cheat sheet lines those worlds up so you can translate muscle memory across tools: if you know how to do it in one, you’ll see the equivalent in the others. It focuses on everyday tasks—loading data, filtering, selecting, sorting, aggregating, joining, creating new columns, handling missing values, exporting, and plotting—using short, copy-pasteable patterns.

What's included

  • Loading data, selecting columns, filtering, sorting
  • Grouping/aggregations, joins, computed columns
  • Missing data handling, exporting, and quick charts

Who this is for

Analysts, engineers, and anyone who bounces between CSVs, databases, and notebooks. If you know one of these tools, this guide helps you move just as fast in the others.

TL;DR: Same task, three idiomatic snippets—minimize guessing and edge-case surprises.

When to use what

  • Excel: quick ad-hoc exploration, lightweight cleanup, and presentation; great for small to medium files and one-off pivots.
  • SQL: the data already lives in a database; you need set-based operations, scalable joins/aggregations, and shared, reproducible queries.
  • Pandas: you want programmable analysis, feature engineering, automation, or to integrate with Python’s ecosystem (files, APIs, ML).

How to use this page

  • Pick the task (row) you need—filter, join, group, etc.
  • Copy the snippet from the column of the tool you’re using.
  • Compare columns to learn the equivalent in the other tools.

How to read this table

Each row is a task. Columns show a minimally idiomatic example for each tool. Replace table_namecolumn, and ranges with your own names. SQL examples are ANSI-style and may need small tweaks for your dialect (Postgres, MySQL, SQLite, SQL Server, etc.). Pandas examples assume:

Notes

  • SQL examples are ANSI-like and may vary by dialect (Postgres, MySQL, SQL Server, SQLite, etc.).
  • Pandas assumes import pandas as pd and a DataFrame named df unless otherwise shown.
  • Excel ranges are illustrative—adjust to your sheet.
TaskExcelSQLPython (Pandas)
Load DataOpen Excel file or use File → OpenSELECT * FROM table_name;df = pd.read_csv("file.csv") (or pd.read_excel("file.xlsx"))
Filter Rows=FILTER(A2:B10, B2:B10>100)SELECT * FROM table WHERE column > 100;df[df["column"] > 100]
Select ColumnsUse column letters (e.g., A, B)SELECT column1, column2 FROM table;df[["column1", "column2"]]
Sort DataData → SortSELECT * FROM table ORDER BY column DESC;df.sort_values(by="column", ascending=False)
Group By / AggregateUse Pivot TableSELECT dept, COUNT(*) FROM emp GROUP BY dept;df.groupby("dept").size() or df.groupby("dept").agg({...})
Count Rows=COUNTA(A2:A100)SELECT COUNT(*) FROM table;len(df) or df.shape[0]
Average / Mean=AVERAGE(B2:B100)SELECT AVG(salary) FROM emp;df["salary"].mean()
SUM=SUM(B2:B100)SELECT SUM(sales) FROM data;df["sales"].sum()
Remove DuplicatesData → Remove DuplicatesSELECT DISTINCT column FROM table;df.drop_duplicates()
Join TablesUse VLOOKUP or XLOOKUPSELECT * FROM A JOIN B ON A.id = B.id;pd.merge(df1, df2, on="id")
Create New ColumnIn new column: =B2*0.1SELECT salary, salary*0.1 AS bonus FROM emp;df["bonus"] = df["salary"] * 0.1
Rename ColumnRename manuallySELECT column AS new_name FROM table;df.rename(columns={"old": "new"})
Handle Missing Data=IF(ISBLANK(A2), "N/A", A2)Use IS NULLCOALESCE()df.fillna("N/A") or df.dropna()
Export DataFile → Save As (CSV/XLSX)Use client tool or COPY/INTO OUTFILE (dialect)df.to_csv("output.csv", index=False)
Data VisualizationInsert → ChartsNot native; use BI toolsdf.plot(kind="bar") (or matplotlib/seaborn)

Quick Tips

  • Type safety: In Pandas, check dtypes with df.dtypes; cast with df.astype({...}) to avoid silent string/number issues.
  • Filtering nulls: In SQL use WHERE column IS NOT NULL. In Pandas use df[df["column"].notna()].
  • Multiple conditions: SQL uses AND/OR; Pandas uses bitwise & / | with parentheses, e.g. df[(df.a > 0) & (df.b == "X")].
  • Indexes: Sorting/grouping in Pandas can be faster if the key column is set as index: df.set_index("id", inplace=True).
  • Large CSVs: Use pd.read_csv(..., chunksize=100_000) to stream in batches.




Comments

Popular posts from this blog

Mount StorageBox to the server for backup

psql: error: connection to server at "localhost" (127.0.0.1), port 5433 failed: ERROR: failed to authenticate with backend using SCRAM DETAIL: valid password not found

Keeping Your SSH Connections Alive: Configuring ServerAliveInterval and ServerAliveCountMax