1. INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN in SQL
-
INNER JOIN: Returns records that have matching values in both tables.
- Use case: When you only want the rows that have matching entries in both tables.
SELECT * FROM Employees INNER JOIN Departments ON Employees.DepartmentID = Departments.ID;
-
LEFT JOIN (or LEFT OUTER JOIN): Returns all records from the left table and the matched records from the right table. Unmatched rows from the right table will have NULL values.
- Use case: When you want all records from the left table, regardless of whether there’s a match in the right table.
SELECT * FROM Employees LEFT JOIN Departments ON Employees.DepartmentID = Departments.ID;
-
RIGHT JOIN (or RIGHT OUTER JOIN): Returns all records from the right table and the matched records from the left table. Unmatched rows from the left table will have NULL values.
- Use case: When you want all records from the right table, regardless of whether there’s a match in the left table.
SELECT * FROM Employees RIGHT JOIN Departments ON Employees.DepartmentID = Departments.ID;
-
FULL OUTER JOIN: Returns all records when there is a match in either table. Rows without matches from either table will contain NULL values.
- Use case: When you want to retrieve all rows from both tables, with or without matches.
SELECT * FROM Employees FULL OUTER JOIN Departments ON Employees.DepartmentID = Departments.ID;
2. Window Functions in SQL
Window functions perform calculations across a set of table rows that are related to the current row, but do not result in data being grouped into fewer rows, unlike aggregate functions. They allow you to compute things like running totals, ranks, or moving averages.
Example: ROW_NUMBER()
, RANK()
, SUM() OVER()
SELECT EmployeeID, Salary, RANK() OVER (ORDER BY Salary DESC) as SalaryRank
FROM Employees;
- Use case: Window functions are useful in scenarios where you need to rank, aggregate, or perform operations on rows while retaining individual row data (e.g., when computing ranks, running totals, or partitioned aggregates).
3. Handling NULL Values in SQL
-
NULL in JOINs: NULLs are ignored in equality comparisons (
=
), which can result in missing matches. For instance,INNER JOIN
won’t return rows where the join column contains NULL values. To handle this, you can use conditions likeIS NULL
orIS NOT NULL
.Example:
SELECT * FROM Employees LEFT JOIN Departments ON Employees.DepartmentID = Departments.ID WHERE Employees.DepartmentID IS NULL;
-
NULL in Aggregations: Functions like
COUNT()
,SUM()
,AVG()
ignore NULLs except forCOUNT(*)
, which counts all rows, including those with NULL values.Example:
SELECT AVG(Salary) FROM Employees WHERE Salary IS NOT NULL;
-
Implications: In JOINs, NULLs can result in missing rows, and in aggregations, NULL values can skew results if not handled correctly.
4. Indexing in SQL Databases
An index is a data structure that improves the speed of data retrieval operations on a table by allowing the database to find rows faster without scanning the entire table.
-
How Indexes Improve Performance: Indexes work like a table of contents in a book, pointing to data more quickly than searching every page. Common types include B-trees and hash indexes.
-
Trade-offs:
- Indexes speed up
SELECT
queries but slow downINSERT
,UPDATE
, andDELETE
operations because the index must also be updated. - Indexes take up additional storage space.
- Indexes speed up
5. SQL Query Optimization Techniques
- Use Indexes: Ensure that relevant columns in
WHERE
clauses and joins are indexed. - **Avoid SELECT ***: Fetch only the necessary columns.
- Proper Joins: Use the right type of join for the task (
INNER JOIN
,LEFT JOIN
, etc.). - Optimize Subqueries: Consider using
JOIN
orCTE
instead of deeply nested subqueries. - Use Query Execution Plan: Use
EXPLAIN
to analyze how the database is executing the query and optimize based on insights. - Limit Rows Returned: Use
LIMIT
orOFFSET
for large result sets when appropriate.
6. Normalization in SQL Databases
Normalization is the process of organizing data to minimize redundancy. The goal is to break down large tables into smaller, related tables.
- 1st Normal Form (1NF): Eliminate repeating groups; ensure each column contains atomic values.
- 2nd Normal Form (2NF): Ensure the table is in 1NF and all non-key attributes are fully dependent on the primary key.
- 3rd Normal Form (3NF): Ensure the table is in 2NF and all attributes are directly dependent on the primary key, not on other non-key attributes.
Impact: Normalization reduces data duplication but may require more complex queries with joins, which can affect performance.
7. Common Table Expressions (CTEs) in SQL
A CTE is a temporary result set that you can reference within a SELECT
, INSERT
, UPDATE
, or DELETE
statement. It’s defined using the WITH
clause.
CTE vs Subquery:
- CTE: Improves readability and is reusable in the same query.
- Subquery: Usually embedded within other queries but is less readable and cannot be reused.
Example:
WITH Sales_CTE AS ( SELECT EmployeeID, SUM(SaleAmount) AS TotalSales FROM Sales GROUP BY EmployeeID
)
SELECT * FROM Sales_CTE WHERE TotalSales > 10000;
8. SQL vs. NoSQL Databases
-
SQL (Relational):
- Structured schema: Uses tables with predefined schemas.
- ACID compliance: Strong consistency, isolation, and transactions.
- Use case: When data relationships are well-defined and consistency is critical (e.g., financial data).
-
NoSQL (Non-Relational):
- Flexible schema: Supports key-value, document, wide-column, or graph models.
- Scalable and fast: Better for unstructured, distributed, or high-velocity data.
- Use case: When flexibility, scalability, and rapid data access are more important than strict consistency (e.g., social media, big data).
9. ACID Properties in SQL Databases
- Atomicity: Ensures that each transaction is treated as a single unit, either fully completing or fully failing.
- Consistency: Guarantees that a transaction takes the database from one valid state to another.
- Isolation: Ensures that transactions operate independently without interference.
- Durability: Once a transaction is committed, it remains so, even in case of a system failure.
These properties are critical for maintaining transactional integrity in SQL databases.
10. Stored Procedures in SQL
Stored procedures are precompiled SQL code that can be executed by the database engine. They allow you to encapsulate business logic and database operations in a reusable and centralized manner.
Advantages:
- Performance: Precompiled, which can reduce execution time.
- Security: Access to data can be controlled more tightly.
- Reusability: Encapsulate logic that can be reused across applications.
Drawbacks:
- Complexity: Can make application logic harder to maintain if too much is encapsulated in stored procedures.
- Portability: Stored procedures are often specific to the database vendor (e.g., SQL Server, MySQL).