Python pandas

Panda is a library for manipulating with data in Python. Quick facts:

Fact	Description
Homepage	https://pandas.pydata.org
API doc	https://pandas.pydata.org/docs/reference/index.html
Initial year	Aug 05, 2009 (13 years ago). https://github.com/pandas-dev/pandas/commit/ec1a0a2a2
Source code	https://github.com/pandas-dev/pandas
Stack Overflow tag	https://stackoverflow.com/questions/tagged/pandas
Latest stable version	`1.4.2` (02 April, 2022)

Development environment

Install pandas

install_panda

Version of Python

(pythonProject1) C:\Users\donhu>python --version
Python 3.10.0

Install

Properties and Method with panda object

import pandas as pd df = pd.DataFrame( { "Name": [ "Braund, Mr. Owen Harris", "Allen, Mr. William Henry", "Bonnell, Miss. Elizabeth", ], "Age": [22, 35, 58], "Sex": ["male", "male", "female"], }
) print("\n01-----------------")
print(df) print()
print("\n02-----------------")
print(df["Age"]) print("\n03-----------------")
ages = pd.Series([22, 35, 58], name="Age")
print(ages) print("\n04-----------------")
print(df["Age"].max()) print("\n05-----------------")
print(ages.max()) print("\n06-----------------")
print(df.describe()) # https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv
print("\n07-----------------")
titanic = pd.read_csv("vy/titanic.csv")
print(titanic) print("\n08-----------------")
print(titanic.head(2)) print("\n09-----------------")
print(titanic.dtypes) print("\n10-----------------")
# pip install openpyxl
# conda install openpyxl
print(titanic.to_excel("minh_thu.xlsx", sheet_name="lovers", index=False)) print("\n11-----------------")
my_titanic = pd.read_excel("minh_thu.xlsx", sheet_name="lovers")
print(my_titanic.head(3)) print("\n12-----------------")
print(my_titanic.info()) # https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv
url = ( "https://raw.github.com/pandas-dev" "/pandas/main/pandas/tests/io/data/csv/tips.csv"
)
tips = pd.read_csv(url)
print("\n12b-----------------")
print(tips) print("\n14-----------------")
sorted_df = tips.sort_values(by='total_bill')
print(sorted_df) print("\n15-----------------")
sorted_df = tips.sort_values(by='total_bill', ascending=False)
print(sorted_df)

result

C:\ProgramData\Anaconda3\envs\pythonProject1\python.exe C:/Users/donhu/PycharmProjects/pythonProject1/vy_panda_01.py 01----------------- Name Age Sex
0 Braund, Mr. Owen Harris 22 male
1 Allen, Mr. William Henry 35 male
2 Bonnell, Miss. Elizabeth 58 female 02-----------------
0 22
1 35
2 58
Name: Age, dtype: int64 03-----------------
0 22
1 35
2 58
Name: Age, dtype: int64 04-----------------
58 05-----------------
58 06----------------- Age
count 3.000000
mean 38.333333
std 18.230012
min 22.000000
25% 28.500000
50% 35.000000
75% 46.500000
max 58.000000 07----------------- PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q [891 rows x 12 columns] 08----------------- PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C [2 rows x 12 columns] 09-----------------
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object 10-----------------
None 11----------------- PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S [3 rows x 12 columns] 12-----------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 891 non-null int64 1 Survived 891 non-null int64 2 Pclass 891 non-null int64 3 Name 891 non-null object 4 Sex 891 non-null object 5 Age 714 non-null float64 6 SibSp 891 non-null int64 7 Parch 891 non-null int64 8 Ticket 891 non-null object 9 Fare 891 non-null float64 10 Cabin 204 non-null object 11 Embarked 889 non-null object dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
None 12b----------------- total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] 14----------------- total_bill tip sex smoker day time size
67 3.07 1.00 Female Yes Sat Dinner 1
92 5.75 1.00 Female Yes Fri Dinner 2
111 7.25 1.00 Female No Sat Dinner 1
172 7.25 5.15 Male Yes Sun Dinner 2
149 7.51 2.00 Male No Thur Lunch 2
.. ... ... ... ... ... ... ...
182 45.35 3.50 Male Yes Sun Dinner 3
156 48.17 5.00 Male No Sun Dinner 6
59 48.27 6.73 Male No Sat Dinner 4
212 48.33 9.00 Male No Sat Dinner 4
170 50.81 10.00 Male Yes Sat Dinner 3 [244 rows x 7 columns] 15----------------- total_bill tip sex smoker day time size
170 50.81 10.00 Male Yes Sat Dinner 3
212 48.33 9.00 Male No Sat Dinner 4
59 48.27 6.73 Male No Sat Dinner 4
156 48.17 5.00 Male No Sun Dinner 6
182 45.35 3.50 Male Yes Sun Dinner 3
.. ... ... ... ... ... ... ...
149 7.51 2.00 Male No Thur Lunch 2
111 7.25 1.00 Female No Sat Dinner 1
172 7.25 5.15 Male Yes Sun Dinner 2
92 5.75 1.00 Female Yes Fri Dinner 2
67 3.07 1.00 Female Yes Sat Dinner 1 [244 rows x 7 columns] Process finished with exit code 0

Pandas Excel API Need install pandas and openpyxl inside Miniconda before practice. This is read excel function.

import pandas as pd found_url = ("https://m.hvtc.edu.vn/Portals/0/01_2018/01.DS%20TN_9.2021%20.xlsx")
hehe = pd.read_excel(found_url)
hehe

Result

Without header

hihi = pd.read_excel(found_url, index_col=None, header=None)
hihi

rb means r + b = read + binary. See https://docs.python.org/3/library/functions.html#open

hoho = pd.read_excel(open('C:\\Users\\donhu\\Desktop\\01.DS TN_9.2021 .xlsx', 'rb'), sheet_name='LC22') hoho

Python pandas

Bình luận

Bài viết tương tự

Thao tác với File trong Python

Tập tành crawl dữ liệu với Scrapy Framework

Sử dụng Misoca API (oauth2) với Python

[Series Pandas DataFrame] Phân tích dữ liệu cùng Pandas (Phần 3)

Lập trình socket bằng Python

[Series Pandas DataFrame] Phân tích dữ liệu cùng Pandas (Phần 2)