标签:
数据库是一个数据存储区用于存储、查询和处理数据。数据库存储我们需要的数据并且开放一个和数据交互的接口。大多数科技公司使用数据库来组织数。数据库系统包括数据库管理软件与管理控制、安全和访问控制,语言与数据库接口这些内容。
首先,我们将关注SQL语言一个结构化查询语言。它是用来查询、更新和修改数据库中的数据。
SQL是最常见的一种数据库语言,在任何数据专业工具箱中都是一个重要的工具。虽然SQL是一种语言,但它完全不同于像Python或r 这样的语言, SQL是专门为了与数据库进行交互而建立的一种语言,不会像传统编程语言一样有很多的功能。由于SQL是一种声明式语言,用户需要关注的是表达他想要的东西而计算机则集中在如何执行计算。
一个数据库就是tables的集合,一个表格包含一行行的数据,以及列信息。table和Pandas中的DataFrame很像。
本文的数据集是**recent-grads.csv**2010到2012的毕业生信息,它的属性如下:
- Rank - Rank by median earnings.
- Major_code - Major code.
- Major - Major description.
- Major_category - Category of major.
- Total - Total number of people with major.
- Sample_size - Sample size (unweighted) of full-time.
- Men - Male graduates.
- Women - Female graduates.
- ShareWomen - Women as share of total.
- Employed - Number employed.
SELECT [columnA, columnB, ...]
FROM tableName;
SELECT Rank,Major
FROM recent_grads;
SQLite是一个轻量级别的数据库,用来探索和学习SQL是再好不过了。
SELECT Rank,Major FROM recent_grads;
‘‘‘
[["Rank", "Major"],
[1, "PETROLEUM ENGINEERING"],
[2, "MINING AND MINERAL ENGINEERING"],
[3, "METALLURGICAL ENGINEERING"],
[4, "NAVAL ARCHITECTURE AND MARINE ENGINEERING"],
. . .
‘‘‘
SELECT Major,Rank FROM recent_grads;
‘‘‘
[["Major", "Rank"],
["PETROLEUM ENGINEERING", 1],
["MINING AND MINERAL ENGINEERING", 2],
["METALLURGICAL ENGINEERING", 3],
. . .
‘‘‘
SELECT Rank,Major_code,Major,Major_category,Total FROM recent_grads;
‘‘‘
[["Rank", "Major_code", "Major", "Major_category", "Total"],
[1, 2419, "PETROLEUM ENGINEERING", "Engineering", 2339],
[2, 2416, "MINING AND MINERAL ENGINEERING", "Engineering", 756],
[3, 2415, "METALLURGICAL ENGINEERING", "Engineering", 856],
[4, 2417, "NAVAL ARCHITECTURE AND MARINE ENGINEERING", "Engineering", 1258],
. . .
‘‘‘
我们用select返回的是整个列的数据,倘若我们想要返回某列满足一些条件的数据,则需要用到where语句,where语句是用来过滤的。
比如我们想获取哪些Major的女生要大于男生,使用where需要三个东西:
SELECT Major,ShareWomen
FROM recent_grads
WHERE ShareWomen > 0.5;
‘‘‘
[["Major", "ShareWomen"],
["ACTUARIAL SCIENCE", 0.535714286],
["COMPUTER SCIENCE", 0.578766338],
["ENVIRONMENTAL ENGINEERING", 0.558548009],
["NURSING", 0.896018988],
. . .
‘‘‘
where语句中可以使用的比较符:< <= > >= = !=
SELECT Major,Employed FROM recent_grads WHERE Employed > 10000;
‘‘‘
[["Major", "Employed"],
["CHEMICAL ENGINEERING", 25694],
["MECHANICAL ENGINEERING", 76442],
["ELECTRICAL ENGINEERING", 61928],
. . .
‘‘‘
前面查询语句返回的数据都很多,有的时候这很麻烦,我们需要做一个限制。SQL中有一个LIMIT 语句可以实现这个功能,LIMIT 语句放在查询的最后。
SELECT Major FROM recent_grads LIMIT 5;
‘‘‘
[["PETROLEUM ENGINEERING"], ["MINING AND MINERAL ENGINEERING"], ["METALLURGICAL ENGINEERING"], ["NAVAL ARCHITECTURE AND MARINE ENGINEERING"], ["CHEMICAL ENGINEERING"]]
‘‘‘
SELECT Major FROM recent_grads WHERE Employed>10000 LIMIT 10;
‘‘‘
[["Major"], ["CHEMICAL ENGINEERING"], ["MECHANICAL ENGINEERING"], ["ELECTRICAL ENGINEERING"], ["COMPUTER ENGINEERING"], ["AEROSPACE ENGINEERING"], ["BIOMEDICAL ENGINEERING"], ["INDUSTRIAL AND MANUFACTURING ENGINEERING"], ["GENERAL ENGINEERING"], ["COMPUTER SCIENCE"], ["MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"]]
‘‘‘
where语句中有6个比较操作,逻辑操作(Logical operators)最常用的是OR 以及AND,使用逻辑操作可以连接多个比较操作,来进行更细致的过滤。
SELECT [column1, column2,...] FROM [table1]
WHERE [condition1] AND [condition2]
SELECT Major,ShareWomen,Employed FROM recent_grads WHERE ShareWomen>0.5 AND Employed>10000 LIMIT 10;
SELECT [column1, column2,...] FROM [table1]
WHERE [condition1] OR [condition2]
SELECT Major,Median,Unemployed FROM recent_grads WHERE Median >= 10000 OR Unemployed <= 1000 LIMIT 20;
select Major, Major_category, ShareWomen, Unemployment_rate
from recent_grads
where (Major_category = ‘Engineering‘) and (ShareWomen > 0.5 or Unemployment_rate < 0.051);
‘‘‘
[["Major", "Major_category", "ShareWomen", "Unemployment_rate"],
["PETROLEUM ENGINEERING", "Engineering", 0.120564344, 0.018380527],
["METALLURGICAL ENGINEERING", "Engineering", 0.153037383, 0.024096386],
["NAVAL ARCHITECTURE AND MARINE ENGINEERING", "Engineering", 0.107313196, 0.050125313],
‘‘‘
where (Major_category = ‘Engineering‘ and ShareWomen > 0.5) or (Unemployment_rate < 0.051)
select Major, Major_category, Employed, Unemployment_rate
from recent_grads
where (Major_category = ‘Business‘ or Major_category = ‘Arts‘ or Major_category = ‘Health‘)
and (Employed > 20000 or Unemployment_rate < 0.051);
SELECT [column1, column2,...] FROM [table1]
WHERE [conditions]..
ORDER BY column1 [ASC or DESC]
select Major
from recent_grads
order by Major desc
limit 10;
‘‘‘
[["Major"], ["ZOOLOGY"], ["VISUAL AND PERFORMING ARTS"], ["UNITED STATES HISTORY"], ["TREATMENT THERAPY PROFESSIONS"], ["TRANSPORTATION SCIENCES AND TECHNOLOGIES"], ["THEOLOGY AND RELIGIOUS VOCATIONS"], ["TEACHER EDUCATION: MULTIPLE LEVELS"], ["STUDIO ARTS"], ["STATISTICS AND DECISION SCIENCE"], ["SPECIAL NEEDS EDUCATION"]]
‘‘‘
select [column1, column2..]
from table_name
order by column1 (asc or desc), column2 (asc or desc)
select Major_category, Median, Major
from recent_grads
order by Major asc, Median desc
limit 20;
标签:
原文地址:http://blog.csdn.net/zm714981790/article/details/51329269