Manipulating DataFrames with Pandas - Python

DataFrame manipulation in Pandas refers to performing operations such as viewing, cleaning, transforming, sorting and filtering tabular data. These operations help organize raw data into a structured and meaningful form that can be easily analyzed.

Note: For this article, we will be using a sample dataset "country_code.csv", to download click here.

Viewing the Data

Before manipulating a DataFrame, it is important to first look at the data to understand its structure, columns and values.

Python

import pandas as pd
df = pd.read_csv("country_code.csv")
print(df)

Output

Country-Code-Dataset — Country Code Dataset

Checking the Size of the DataFrame

Understanding the size of a DataFrame is an important first step before performing any data operations. It helps you know how many records and features you are working with.

Python

print(df.shape)

Output

(249, 2)

Explanation: shape attribute returns a tuple containing the total number of rows and columns in the DataFrame.

Getting Summary Statistics

Summary statistics provide a quick numerical overview of the dataset. They help identify data ranges, averages, and possible outliers in numerical columns.

Python

print(df.describe())

Output

Describedata — Statistical summary table

Explanation: describe() computes count, mean, min, max, and quartiles for numerical columns.

Dropping Missing Values

Missing values can negatively impact analysis and lead to incorrect results. Removing such rows ensures cleaner and more reliable data for further processing.

Python

print(df.dropna())

Output

DroppingMissingValues — DataFrame without rows containing NaN values

Explanation: dropna() removes rows that contain missing (NaN) values.

Dropping Columns with Missing Values

In some cases, entire columns may contain missing data and are not useful for analysis. Such columns can be removed to simplify the DataFrame.

Python

print(df.dropna(axis=1))

Output

DroppingColumnWise — DataFrame without columns containing NaN values

Explanation: Using axis=1 tells Pandas to drop columns instead of rows.

Merging DataFrames

Merging allows to combine data from multiple DataFrames based on a common column. This is especially useful when related data is stored in separate files.

Python

df1 = pd.read_csv("country_code.csv")

# Creating another sample DataFrame
df2 = pd.DataFrame({"Name": ["India", "United States", "Canada"],
                    "Continent": ["Asia", "North America", "North America"]})
res = pd.merge(df1, df2, on="Name")
print(res)

Output

Screenshot-2026-04-08-115852 — Merging Dataframes

Explanation: country codes are merged with continent information using the "Name" column.

Renaming Columns

Clear and meaningful column names make a DataFrame easier to understand and work with. Renaming columns improves readability without changing the underlying data.

Python

res = df.rename(columns={"Name": "CountryName", "Code": "CountryCode"})
print(res)

Output

RenamingColumns — DataFrame with renamed columns

Explanation: rename() method returns a new DataFrame with updated column names unless inplace=True is specified, in which case the original DataFrame is modified.

Sorting the Data

Sorting data helps arrange values in a logical order, making patterns and comparisons easier to observe.

Python

print(df.sort_values(by="Name"))

Output

Explanation: method sort_values() arranges rows based on the specified column.

Filtering Rows Using Conditions

Filtering allows to extract only the rows that satisfy a specific condition. This helps focus analysis on relevant data only.

Python

print(df[df["Code"] == "IN"])

Output

Explanation: condition inside df[...] filters rows where the column value matches the given condition.

Manipulating DataFrames with Pandas - Python

Viewing the Data

Checking the Size of the DataFrame

Getting Summary Statistics

Dropping Missing Values

Dropping Columns with Missing Values

Merging DataFrames

Renaming Columns

Sorting the Data

Filtering Rows Using Conditions

Explore