update history:
- 2020/02/09 original post
- 2020/02/14 update time functions
Short Answers
Join Table by Where vs by On
The ON
clause defines the relationship between the tables.
The WHERE
clause describes which rows you are interested in.
Many times you can swap them and still get the same result, however this is not always the case with a left outer join.
- If the
ON
clause fails you still get a row with columns from the left table but with nulls in the columns from the right table.- If the
WHERE
clause fails you won’t get that row at all.
Join VS Subquery
Join is also called Cartesian Product
Answer quoted from mysql:
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact
that is not specific to MySQL Server alone. Prior to SQL-92, outer joins did not exist, so subqueries were the only way to do certain things. Today, MySQL Server and many other modern database systems offer a wide range of outer join types.
Answer quoted from Stackoverflow:
For me, I’ll say that subquery looks easier, more readable and understandable versus join. And you should keep in mind that there might be duplicate records while using join query. So you should always keep cautious.
Exercise
Find ID whose years in 2016 not in 2017, you can't use subquery,
only one select statement
+---+-----+
|ID | Year|
+---+-----+
|1 | 2016|
|1 | 2017|
|2 | 2017|
|3 | 2016|
+---+-----+
This Question restrain you only use one select
statement without any subquery ( left join (select * from xxx)) is forbidden too)
The solution below shows the technique using join
and on
to achieve the same effect of using subquery
Most important is to note that on T2.year = 2017
You should also be skilled at using combination of left join
and is null
SELECT DISTINCT *
FROM ID_Year T1
LEFT JOIN ID_Year T2 ON T1.ID = T2.ID AND T2.year = 2017
WHERE T1.year = 2016
AND T2.year IS NULL
SQL vs NoSQL
Parameter | SQL | NOSQL |
---|---|---|
Definition | SQL databases are primarily called RDBMS or Relational Databases | NoSQL databases are primarily called as Non-relational or distributed database |
Design for | Traditional RDBMS uses SQL syntax and queries to analyze and get the data for further insights. They are used for OLAP systems. | NoSQL database system consists of various kind of database technologies. These databases were developed in response to the demands presented for the development of the modern application. |
Query Language | Structured query language (SQL) | No declarative query language |
Type | SQL databases are table based databases | NoSQL databases can be document based, key-value pairs, graph databases |
Schema | SQL databases have a predefined schema | NoSQL databases use dynamic schema for unstructured data. |
Ability to scale | SQL databases are vertically scalable | NoSQL databases are horizontally scalable |
KEY DIFFERENCE
- SQL pronounced as “S-Q-L” or as “See-Quel” is primarily called RDBMS or Relational Databases whereas NoSQL is a Non-relational or Distributed Database.
- SQL databases are table based databases whereas NoSQL databases can be document based, key-value pairs, graph databases.
- SQL databases are vertically scalable while NoSQL databases are horizontally scalable.
- SQL databases have a predefined schema whereas NoSQL databases use dynamic schema for unstructured data.
- SQL requires specialized DB hardware for better performance while NoSQL uses commodity hardware.
https://stackoverflow.com/questions/46856267/implement-rank-without-using-analytic-function)
Count(1) Vs Count(*)
The difference is simple:
COUNT(*) counts the number of rows produced by the query, whereas COUNT(1) counts the number of 1 values.
Note that when you include a literal such as a number or a string in a query, this literal is “appended” or attached to every row that is produced by the FROM clause. This also applies to literals in aggregate functions, such as COUNT(1). It’s hard to imagine a scenario where the COUNT(*) and COUNT(1) values would be different, so please do let me know if you find one.
In general, you should always use COUNT(*). This is because the database can often count rows by accessing an index, which is much faster than accessing a table. If you use COUNT(column), the database must actually inspect the individual values in the column, since it will not count NULLs. Aggregate functions like COUNT and SUM always ignore NULLs.
One would like to assume that the database optimizer is smart enough to realize that the literal value 1 will never be NULL, and not need to inspect it on every row, but it also would not surprise me to find out that the actions of appending the 1s into every row produced by the FROM clause and counting the 1s are handled by two separate logic modules, and maybe the module doing the counting doesn’t know where the 1s came from. So just use COUNT(*).
In terms of result,
- count(*) == count(1) == count(any number here)
- count(column_name) eliminate NULL
- count(distinct columnn_name) eliminate duplicates
Left Join vs Right Join
-
INNER JOIN: returns rows when there is a match in both tables.
-
LEFT JOIN: returns all rows from the left table, even if there are no matches in the right table.
-
RIGHT JOIN: returns all rows from the right table, even if there are no matches in the left table.
Short-Short Answers
-
The result of
1+NULL
null
-
Filter from group result
having clause
Data Definition Language (DDL)
Query
create database [name]
= create schema [name]
CREATE TABLE Employee (
empId INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
empSSN CHAR(11) NOT NULL UNIQUE,
...)
Drop database [name]
this will delete an entire database in the blink of an eye, you should be careful
Drop Table [name]
when you have a reference, you cannot delete the table directly
Auto_increment
generate unique IDs
CREATE TABLE Employee (
empId INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
...)
Alter Table
if you want to change a table
ALTER TABLE Student ADD COLUMN birth_year INT;
ALTER TABLE Student DROP COLUMN birth_year;
ALTER TABLE Student MODIFY COLUMN birth_year CHAR(4);
Describe Table
to get information for a particular table
Show tables
to show all tables in the current database
Data Types
- INT / BIGINT / TINYINT / FLOAT / DOUBLE
- DATETIME [YYYY-MM-DD HH:MM:SS] / DATE [YYYY-MM-DD] / TIMESTAMP [YYYY-MM-DD HH:MM:SS] Timestamp may vary according to the time zone while datetime don’t change
- CHAR / VARCHAR
Data Manipulation Language (DML)
Insert
Insert into a subset of fields
Insert Into All fields
Insert into (from another table)
Update
Be careful with UPDATE, if the WHERE clause is missing, it will update every row with the same values
Update table_name SET column_1 = expression_1 WHERE xxx
Update clause can also used with subquery
Delete
Delete [FROM] table_name [WHERE search_condition]
Selection
JOIN
Simulate a full outer join
in mysql. There is no full outer join in mysql
And we can use left join
, union all
and right join
to mimic it
SELECT department_number,
department_name,
employee_id,
first_name,
last_name
FROM employees AS E
LEFT JOIN departments AS D USING(department_number)
UNION ALL
SELECT department_number,
department_name,
employee_id,
first_name,
last_name
FROM employees AS E
RIGHT JOIN departments AS D USING(department_number)
WHERE employee_id IS NULL;
Advanced SQL
With Statement
-
Recursive Query
Read the example and learn how to write recursive query
WITH RECURSIVE cte AS ( SELECT *, 1 AS level FROM employees WHERE first_name = 'Paulo' AND last_name = 'Locario' UNION ALL SELECT E.*, t.level+1 as level FROM employees AS E JOIN cte T ON E.manager_id = t.employee_id ) SELECT last_name,first_name,level from cte ORDER BY employee_id;
The sequence of writing a recursive query is that you need to find a base case first, usually with a
where
constriction. And then, youUNION ALL
the same table, theCTE
table and join on the condition. Much easier than Python query.
Functions
String Selection
Filter for string condition
-- 10. (I do) Show all vendors with names beginning with A to L
SELECT *
FROM vendors
WHERE vendor_name < 'M';
String Functions
- concat()
concat(string / col, string / col / , ...)
Example
SELECT concat(first_name,' ',last_name) AS vendor_names
FROM vendor_contacts;
-
Like / Not Like
-
SUBSTR(string, start, length)
SELECT SUBSTR("SQL Tutorial",5,3)
Return TutSELECT SUBSTR("SQL Tutorial",5)
start from position 5 and select till the end -
INSTR(string1,string2)
The INSTR() function returns the position of the first occurrence of the searched string2.
TIME FUNCTIONS
DATE_TIME
date_time(date, format)
Specifier | Description |
---|---|
%c |
Month, numeric (0 ..12 ) |
%d |
Day of the month, numeric (00 ..31 ) |
%e |
Day of the month, numeric (0 ..31 ) |
%H |
Hour (00 ..23 ) |
%h |
Hour (01 ..12 ) |
%I |
Hour (01 ..12 ) |
%i |
Minutes, numeric (00 ..59 ) |
%k |
Hour (0 ..23 ) |
%l |
Hour (1 ..12 ) |
%m |
Month, numeric (00 ..12 ) |
%s |
Seconds (00 ..59 ) |
%T |
Time, 24-hour (hh:mm:ss ) |
%Y |
Year, numeric, four digits |
%y |
Year, numeric (two digits) |
%% |
A literal % character |
-
DATEDIFF(exp1, exp2)
Return
exp1 - exp2
by days -
TO_DAYS(Date)
Return days of the date
-
CURRENT_DATE() / CURDATE()
Return current date
-
TIMESTAMPDIFF(unit, start ,end )
It will return the desired unit differ and doesn’t require the start and end have the same format
SELECT TIMESTAMPDIFF(MONTH, '2018-01-01', '2018-06-01') result +--------+ | result | +--------+ | 5 | +--------+ 1 row in set
-
DAYOFWEEK(date)
Return range from 1 to 7. Sunday is 1, Saturday is 7.
-
WEEKDAY(date)
The WEEKDAY() function returns the weekday number for a given date.
Note: 0 = Monday, 1 = Tuesday, 2 = Wednesday, 3 = Thursday, 4 = Friday, 5 = Saturday, 6 = Sunday.
-
**MONTH() / YEAR() / DAY() **
Extract the month/day/year of a
timestamp
Others
-
IFULL(column, default return)
If the column is null, it will return the default return, otherwise return the column
Tips
-
In
WHERE
clause, you cannot directly refer to the renamed column you selected, you can usehaving
or restate the original column namei.e.
# correct SELECT invoice_number, invoice_total - payment_total - credit_total AS balance FROM invoices WHERE invoice_total - payment_total - credit_total = 0; # correct SELECT invoice_number, invoice_total - payment_total - credit_total AS balance FROM invoices HAVING balance = 0; # wrong SELECT invoice_number, invoice_total - payment_total - credit_total AS balance FROM invoices WHERE balance = 0;
-
When you are asked to find
earliest
,latest
,most recent
records, remember that you can usemin
ormax()
function to get the value -
When you are conducting
self join
, it`s better to name the table in a more meaningful wayFor example,
employee table
can be names asemployee
andmanager
-
We usually use
full-group_by
so that we should include all selected columns in thegroup by
clause.So be careful that those columns not including in the
groupby
might result errors