Practical Examples: SQL Operations with Pandas
SELECT
SELECT column1, column2 FROM table WHERE condition;
Pandas
import pandas as pd df_selected = df[['column1', 'column2']][df['condition']]
WHERE
SELECT * FROM table WHERE condition;
Pandas
df_filtered = df[df['condition']]
ORDER BY
SELECT * FROM table ORDER BY column DESC;
Pandas
df_sorted = df.sort_values(by='column', ascending=False)
GROUP BY
SELECT column, COUNT(*), AVG(value) FROM table GROUP BY column;
Pandas
df_grouped = df.groupby('column').agg({'column': 'count', 'value': 'mean'})
JOIN
SELECT * FROM table1 INNER JOIN table2 ON table1.id = table2.id;
Pandas
df_merged = pd.merge(table1, table2, on='id', how='inner')
INSERT
INSERT INTO table (column1, column2) VALUES (value1, value2);
Pandas
new_row = {'column1': value1, 'column2': value2} df = df.append(new_row, ignore_index=True)
UPDATE
UPDATE table SET column = value WHERE condition;
Pandas
df.loc[df['condition'], 'column'] = value
DELETE
DELETE FROM table WHERE condition;
Pandas
df = df[~df['condition']]
Complete example with fictional DataFrame
import pandas as pd # Creating a fictional DataFrame data = {'ID': [1, 2, 3, 4, 5], 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'], 'Age': [25, 30, 22, 35, 28], 'Salary': [50000, 60000, 45000, 70000, 55000]} df = pd.DataFrame(data) # Displaying the original DataFrame print("Original DataFrame:") print(df) print("\n") # 1. SELECT selected_data = df[['ID', 'Name', 'Salary']][df['Age'] > 25] print("1. SELECT WHERE Age > 25:") print(selected_data) print("\n") # 2. ORDER BY sorted_data = df.sort_values(by='Salary', ascending=False) print("2. ORDER BY Salary DESC:") print(sorted_data) print("\n") # 3. GROUP BY and aggregation grouped_data = df.groupby('Age').agg({'Salary': 'mean', 'Name': 'count'}).reset_index() print("3. GROUP BY Age, AVG(Salary), COUNT(Name):") print(grouped_data) print("\n") # 4. JOIN df2 = pd.DataFrame({'ID': [1, 2, 3, 6], 'Department': ['HR', 'IT', 'Marketing', 'Finance']}) merged_data = pd.merge(df, df2, on='ID', how='left') print("4. JOIN with Departments:") print(merged_data) print("\n") # 5. INSERT new_row = {'ID': 6, 'Name': 'Frank', 'Age': 32, 'Salary': 62000, 'Department': 'Sales'} df = df.append(new_row, ignore_index=True) print("5. INSERT - Adding Frank to the DataFrame:") print(df) print("\n") # 6. UPDATE df.loc[df['Name'] == 'Bob', 'Salary'] = 65000 print("6. UPDATE - Bob's Salary Updated:") print(df) print("\n") # 7. DELETE df = df[df['Salary'] < 60000] print("7. DELETE - Removing Employees with Salary < 60000:") print(df) print("\n")
Conclusion
Pandas not only replicates SQL functionalities but also offers a rich set of tools for data exploration, making it an invaluable asset in the toolkit of data professionals. From SELECT queries to JOIN operations, the versatility of Pandas shines through, providing an intuitive and powerful platform for efficient data analysis.
As you embark on your data journey, mastering Pandas unlocks a world of possibilities for insightful analysis and creative exploration. Whether you're a seasoned SQL user or a Python enthusiast, integrating Pandas into your workflow empowers you to harness the full potential of your data.
Explore the depths of Pandas, and witness firsthand how this powerful library transforms the landscape of data manipulation, bringing simplicity, speed, and versatility to the forefront of your analytical endeavors.