Data Manipulation in Python: Working with Databases and CSV Files

Data manipulation is a crucial skill in the world of data science. In this tutorial, we will explore how to work with databases and CSV files using Python, a popular programming language for data analysis. We will cover the basics of reading, writing, and updating data in both formats, using SQLite for databases and Python’s built-in csv module for CSV files.

Working with SQLite Databases in Python

SQLite is an open-source, file-based database management system that is easy to use and widely supported. Python includes the sqlite3 module in its standard library, allowing you to interact with SQLite databases without requiring any additional installations.

To get started, import the sqlite3 module and create a connection to a database file:

import sqlite3
conn = sqlite3.connect("example.db")

This will create a new file named “example.db” if it does not already exist. Next, we will create a table in the database and insert some data:

cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)")
users = [(1, "Alice", 30), (2, "Bob", 25), (3, "Cathy", 27)]
cursor.executemany("INSERT INTO users VALUES (?,?,?)", users)
conn.commit()

Here, we first created a cursor object that allows us to execute SQL commands. We then created a table named “users” with three columns: id, name, and age. We used the “IF NOT EXISTS” clause to avoid creating the table if it already exists. Finally, we inserted three rows of data into the table and committed the changes to the database.

To query the data from the table, we can use the “SELECT” statement:

cursor.execute("SELECT * FROM users")
rows = cursor.fetchall()
for row in rows:
    print(row)

This will output the following:

(1, 'Alice', 30)
(2, 'Bob', 25)
(3, 'Cathy', 27)

Now, let’s say Alice just had her birthday and we want to update her age in our database. We can do this using the “UPDATE” statement:

cursor.execute("UPDATE users SET age = ? WHERE name = ?", (31, 'Alice'))
conn.commit()

This command will change Alice’s age from 30 to 31. To verify the changes, we can print out the contents of the table again:

cursor.execute("SELECT * FROM users")
rows = cursor.fetchall()
for row in rows:
    print(row)

The output should now reflect the updated age:

(1, 'Alice', 31)
(2, 'Bob', 25)
(3, 'Cathy', 27)

Remember to close the connection when you are done working with the database:

conn.close()

Working with CSV Files in Python

Python provides the built-in csv module to work with CSV files. First, let’s create a CSV file with the same data as our SQLite example:

import csv

users = [(1, "Alice", 30), (2, "Bob", 25), (3, "Cathy", 27)]

with open("example.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["id", "name", "age"])
    writer.writerows(users)

Here, we opened a new file named “example.csv” in write mode, and created a csv.writer object to write data to the file. We first wrote the header row, and then wrote the rows of data using the writerows() method.

To read the data from the CSV file, we can use the csv.reader object:

with open("example.csv", "r", newline="") as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(row)

This will output the following:

['id', 'name', 'age']
['1', 'Alice', '30']
['2', 'Bob', '25']
['3', 'Cathy', '27']

Now, let’s say we want to add a new user, David, who is 32 years old. Here’s how you can append a new row to your CSV file:

new_user = [(4, "David", 32)]
with open("example.csv", "a", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(new_user)

To verify that the new user has been added, you can read and print out the contents of the CSV file again:

with open("example.csv", "r", newline="") as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(row)

You should see:

['id', 'name', 'age']
['1', 'Alice', '30']
['2', 'Bob', '25']
['3', 'Cathy', '27']
['4', 'David', '32']

Next, we’ll remove a user, say Alice, from the data. This task is a bit trickier as CSV files don’t support direct deletion of rows. We’ll need to read the data into a list, modify the list, and then write it back to the file:

with open("example.csv", "r", newline="") as csvfile:
    reader = csv.reader(csvfile)
    rows = list(reader)

with open("example.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    for row in rows:
        if row[1] != 'Alice':
            writer.writerow(row)

Now, when we print out the contents of the CSV file, Alice should no longer be in the list:

with open("example.csv", "r", newline="") as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(row)

The output should now be:

['id', 'name', 'age']
['2', 'Bob', '25']
['3', 'Cathy', '27']
['4', 'David', '32']

With these examples, you’ve learned not only how to add new data to a CSV file, but also how to remove specific rows from a CSV file.

In this tutorial, we have explored the basics of data manipulation in Python, using SQLite databases and CSV files. With these skills, you are now better equipped to handle and analyze data in your data science projects. Happy coding!