Manipulating DataFrames with Pandas - Python

Last Updated : 8 Apr, 2026

DataFrame manipulation in Pandas refers to performing operations such as viewing, cleaning, transforming, sorting and filtering tabular data. These operations help organize raw data into a structured and meaningful form that can be easily analyzed.

Note: For this article, we will be using a sample dataset "country_code.csv", to download click here.

Viewing the Data

Before manipulating a DataFrame, it is important to first look at the data to understand its structure, columns and values.

Python
import pandas as pd
df = pd.read_csv("country_code.csv")
print(df)

Output

Country-Code-Dataset
Country Code Dataset

Checking the Size of the DataFrame

Understanding the size of a DataFrame is an important first step before performing any data operations. It helps you know how many records and features you are working with.

Python
print(df.shape)

Output

(249, 2)

Explanation: shape attribute returns a tuple containing the total number of rows and columns in the DataFrame.

Getting Summary Statistics

Summary statistics provide a quick numerical overview of the dataset. They help identify data ranges, averages, and possible outliers in numerical columns.

Python
print(df.describe())

Output

Describedata
Statistical summary table

Explanation: describe() computes count, mean, min, max, and quartiles for numerical columns.

Dropping Missing Values

Missing values can negatively impact analysis and lead to incorrect results. Removing such rows ensures cleaner and more reliable data for further processing.

Python
print(df.dropna())

Output

DroppingMissingValues
DataFrame without rows containing NaN values

Explanation: dropna() removes rows that contain missing (NaN) values.

Dropping Columns with Missing Values

In some cases, entire columns may contain missing data and are not useful for analysis. Such columns can be removed to simplify the DataFrame.

Python
print(df.dropna(axis=1))

Output

DroppingColumnWise
DataFrame without columns containing NaN values

Explanation: Using axis=1 tells Pandas to drop columns instead of rows.

Merging DataFrames

Merging allows to combine data from multiple DataFrames based on a common column. This is especially useful when related data is stored in separate files.

Python
df1 = pd.read_csv("country_code.csv")

# Creating another sample DataFrame
df2 = pd.DataFrame({"Name": ["India", "United States", "Canada"],
                    "Continent": ["Asia", "North America", "North America"]})
res = pd.merge(df1, df2, on="Name")
print(res)

Output

Screenshot-2026-04-08-115852
Merging Dataframes

Explanation: country codes are merged with continent information using the "Name" column.

Renaming Columns

Clear and meaningful column names make a DataFrame easier to understand and work with. Renaming columns improves readability without changing the underlying data.

Python
res = df.rename(columns={"Name": "CountryName", "Code": "CountryCode"})
print(res)

Output

RenamingColumns
DataFrame with renamed columns

Explanation: rename() method returns a new DataFrame with updated column names unless inplace=True is specified, in which case the original DataFrame is modified.

Sorting the Data

Sorting data helps arrange values in a logical order, making patterns and comparisons easier to observe.

Python
print(df.sort_values(by="Name"))

Output

Sorting_columns
Sorted DataFrame

Explanation: method sort_values() arranges rows based on the specified column.

Filtering Rows Using Conditions

Filtering allows to extract only the rows that satisfy a specific condition. This helps focus analysis on relevant data only.

Python
print(df[df["Code"] == "IN"])

Output

FilteringRows
Filtered rows

Explanation: condition inside df[...] filters rows where the column value matches the given condition.

Comment

Explore