Excel, SQL, and Pandas: One-to-One Cheat Sheet for Everyday Data Tasks
A compact, side-by-side reference that maps common data tasks across Excel, SQL, and Python (Pandas). Copy/paste the snippets and adapt table/column names to your data.
A lot of us think in Excel, work with data stored in SQL databases, and script repeatable workflows in Python (Pandas). This cheat sheet lines those worlds up so you can translate muscle memory across tools: if you know how to do it in one, you’ll see the equivalent in the others. It focuses on everyday tasks—loading data, filtering, selecting, sorting, aggregating, joining, creating new columns, handling missing values, exporting, and plotting—using short, copy-pasteable patterns.
What's included
- Loading data, selecting columns, filtering, sorting
- Grouping/aggregations, joins, computed columns
- Missing data handling, exporting, and quick charts
Who this is for
Analysts, engineers, and anyone who bounces between CSVs, databases, and notebooks. If you know one of these tools, this guide helps you move just as fast in the others.
TL;DR: Same task, three idiomatic snippets—minimize guessing and edge-case surprises.
When to use what
- Excel: quick ad-hoc exploration, lightweight cleanup, and presentation; great for small to medium files and one-off pivots.
- SQL: the data already lives in a database; you need set-based operations, scalable joins/aggregations, and shared, reproducible queries.
- Pandas: you want programmable analysis, feature engineering, automation, or to integrate with Python’s ecosystem (files, APIs, ML).
How to use this page
- Pick the task (row) you need—filter, join, group, etc.
- Copy the snippet from the column of the tool you’re using.
- Compare columns to learn the equivalent in the other tools.
How to read this table
Each row is a task. Columns show a minimally idiomatic example for each tool. Replace table_name
, column
, and ranges with your own names. SQL examples are ANSI-style and may need small tweaks for your dialect (Postgres, MySQL, SQLite, SQL Server, etc.). Pandas examples assume:
Notes
- SQL examples are ANSI-like and may vary by dialect (Postgres, MySQL, SQL Server, SQLite, etc.).
- Pandas assumes
import pandas as pd
and a DataFrame nameddf
unless otherwise shown.- Excel ranges are illustrative—adjust to your sheet.
Task | Excel | SQL | Python (Pandas) |
---|---|---|---|
Load Data | Open Excel file or use File → Open | SELECT * FROM table_name; | df = pd.read_csv("file.csv") (or pd.read_excel("file.xlsx") ) |
Filter Rows | =FILTER(A2:B10, B2:B10>100) | SELECT * FROM table WHERE column > 100; | df[df["column"] > 100] |
Select Columns | Use column letters (e.g., A, B) | SELECT column1, column2 FROM table; | df[["column1", "column2"]] |
Sort Data | Data → Sort | SELECT * FROM table ORDER BY column DESC; | df.sort_values(by="column", ascending=False) |
Group By / Aggregate | Use Pivot Table | SELECT dept, COUNT(*) FROM emp GROUP BY dept; | df.groupby("dept").size() or df.groupby("dept").agg({...}) |
Count Rows | =COUNTA(A2:A100) | SELECT COUNT(*) FROM table; | len(df) or df.shape[0] |
Average / Mean | =AVERAGE(B2:B100) | SELECT AVG(salary) FROM emp; | df["salary"].mean() |
SUM | =SUM(B2:B100) | SELECT SUM(sales) FROM data; | df["sales"].sum() |
Remove Duplicates | Data → Remove Duplicates | SELECT DISTINCT column FROM table; | df.drop_duplicates() |
Join Tables | Use VLOOKUP or XLOOKUP | SELECT * FROM A JOIN B ON A.id = B.id; | pd.merge(df1, df2, on="id") |
Create New Column | In new column: =B2*0.1 | SELECT salary, salary*0.1 AS bonus FROM emp; | df["bonus"] = df["salary"] * 0.1 |
Rename Column | Rename manually | SELECT column AS new_name FROM table; | df.rename(columns={"old": "new"}) |
Handle Missing Data | =IF(ISBLANK(A2), "N/A", A2) | Use IS NULL , COALESCE() | df.fillna("N/A") or df.dropna() |
Export Data | File → Save As (CSV/XLSX) | Use client tool or COPY/INTO OUTFILE (dialect) | df.to_csv("output.csv", index=False) |
Data Visualization | Insert → Charts | Not native; use BI tools | df.plot(kind="bar") (or matplotlib/seaborn) |
Quick Tips
- Type safety: In Pandas, check dtypes with
df.dtypes
; cast withdf.astype({...})
to avoid silent string/number issues. - Filtering nulls: In SQL use
WHERE column IS NOT NULL
. In Pandas usedf[df["column"].notna()]
. - Multiple conditions: SQL uses
AND/OR
; Pandas uses bitwise&
/|
with parentheses, e.g.df[(df.a > 0) & (df.b == "X")]
. - Indexes: Sorting/grouping in Pandas can be faster if the key column is set as index:
df.set_index("id", inplace=True)
. - Large CSVs: Use
pd.read_csv(..., chunksize=100_000)
to stream in batches.
Comments
Post a Comment