经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 程序设计 » Python » 查看文章
3-1 Pandas-概述
来源:cnblogs  作者:karina512  时间:2019/10/9 8:56:30  对本文有异议

 Pandas章节应用的数据可以在以下链接下载:

https://files.cnblogs.com/files/AI-robort/Titanic_Data-master.zip

 

           Pandas:数据分析处理库

In [1]:
  1. import pandas as pd
In [4]:
  1. df=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
 

.head():可以读取前几条数据,或指定前几条都可以

In [5]:
  1. df.head(6)
Out[5]:
 
 PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
 

.info():返回当前的信息

In [6]:
  1. df.info()
 
  1. <class 'pandas.core.frame.DataFrame'>
  2. RangeIndex: 891 entries, 0 to 890
  3. Data columns (total 12 columns):
  4. PassengerId 891 non-null int64
  5. Survived 891 non-null int64
  6. Pclass 891 non-null int64
  7. Name 891 non-null object
  8. Sex 891 non-null object
  9. Age 714 non-null float64
  10. SibSp 891 non-null int64
  11. Parch 891 non-null int64
  12. Ticket 891 non-null object
  13. Fare 891 non-null float64
  14. Cabin 204 non-null object
  15. Embarked 889 non-null object
  16. dtypes: float64(2), int64(5), object(5)
  17. memory usage: 83.6+ KB
 

查看表格的各项属性和细节

In [7]:
  1. df.index#索引值的属性
Out[7]:
  1. RangeIndex(start=0, stop=891, step=1)
In [8]:
  1. df.columns#每一列的名字
Out[8]:
  1. Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
  2. 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
  3. dtype='object')
In [9]:
  1. df.dtypes#每一列的值的类型
Out[9]:
  1. PassengerId int64
  2. Survived int64
  3. Pclass int64
  4. Name object
  5. Sex object
  6. Age float64
  7. SibSp int64
  8. Parch int64
  9. Ticket object
  10. Fare float64
  11. Cabin object
  12. Embarked object
  13. dtype: object
In [10]:
  1. df.values#每行的值
Out[10]:
  1. array([[1, 0, 3, ..., 7.25, nan, 'S'],
  2. [2, 1, 1, ..., 71.2833, 'C85', 'C'],
  3. [3, 1, 3, ..., 7.925, nan, 'S'],
  4. ...,
  5. [889, 0, 3, ..., 23.45, nan, 'S'],
  6. [890, 1, 1, ..., 30.0, 'C148', 'C'],
  7. [891, 0, 3, ..., 7.75, nan, 'Q']], dtype=object)
 

自己创建data_frame数据

In [11]:
  1. data={'country':['aaa','bbb','ccc'],'population':[10,12,14]}
  2. df_data=pd.DataFrame(data)
  3. df_data
Out[11]:
 
 countrypopulation
0 aaa 10
1 bbb 12
2 ccc 14
In [12]:
  1. df_data.info()
 
  1. <class 'pandas.core.frame.DataFrame'>
  2. RangeIndex: 3 entries, 0 to 2
  3. Data columns (total 2 columns):
  4. country 3 non-null object
  5. population 3 non-null int64
  6. dtypes: int64(1), object(1)
  7. memory usage: 128.0+ bytes
In [15]:
  1. age=df['Age']#搜索对应的一列
  2. age[:5]#显示前5行数据
Out[15]:
  1. 0 22.0
  2. 1 38.0
  3. 2 26.0
  4. 3 35.0
  5. 4 35.0
  6. Name: Age, dtype: float64
 

series:dataframe中的一行/列

In [16]:
  1. age.index
Out[16]:
  1. RangeIndex(start=0, stop=891, step=1)
In [17]:
  1. age.values[:5]
Out[17]:
  1. array([22., 38., 26., 35., 35.])
In [18]:
  1. df.head()
Out[18]:
 
 PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [19]:
  1. df['Age'][:5]
Out[19]:
  1. 0 22.0
  2. 1 38.0
  3. 2 26.0
  4. 3 35.0
  5. 4 35.0
  6. Name: Age, dtype: float64
 

改变索引对象

In [20]:
  1. df=df.set_index('Name')
  2. df.head()
Out[20]:
 
 PassengerIdSurvivedPclassSexAgeSibSpParchTicketFareCabinEmbarked
Name           
Braund, Mr. Owen Harris 1 0 3 male 22.0 1 0 A/5 21171 7.2500 NaN S
Cumings, Mrs. John Bradley (Florence Briggs Thayer) 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C
Heikkinen, Miss. Laina 3 1 3 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
Futrelle, Mrs. Jacques Heath (Lily May Peel) 4 1 1 female 35.0 1 0 113803 53.1000 C123 S
Allen, Mr. William Henry 5 0 3 male 35.0 0 0 373450 8.0500 NaN S
In [21]:
  1. df['Age'][:5]
Out[21]:
  1. Name
  2. Braund, Mr. Owen Harris 22.0
  3. Cumings, Mrs. John Bradley (Florence Briggs Thayer) 38.0
  4. Heikkinen, Miss. Laina 26.0
  5. Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0
  6. Allen, Mr. William Henry 35.0
  7. Name: Age, dtype: float64
In [25]:
  1. age=df['Age']
  2. age[:5]
Out[25]:
  1. Name
  2. Braund, Mr. Owen Harris 22.0
  3. Cumings, Mrs. John Bradley (Florence Briggs Thayer) 38.0
  4. Heikkinen, Miss. Laina 26.0
  5. Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0
  6. Allen, Mr. William Henry 35.0
  7. Name: Age, dtype: float64
In [26]:
  1. age['Allen, Mr. William Henry']#索引名字对应的值
Out[26]:
  1. 35.0
In [27]:
  1. age=age+10
  2. age[:5]
Out[27]:
  1. Name
  2. Braund, Mr. Owen Harris 32.0
  3. Cumings, Mrs. John Bradley (Florence Briggs Thayer) 48.0
  4. Heikkinen, Miss. Laina 36.0
  5. Futrelle, Mrs. Jacques Heath (Lily May Peel) 45.0
  6. Allen, Mr. William Henry 45.0
  7. Name: Age, dtype: float64
 

对值统计指标

In [28]:
  1. age.mean()
Out[28]:
  1. 39.69911764705882
In [29]:
  1. age.max()
Out[29]:
  1. 90.0
In [30]:
  1. age.min()
Out[30]:
  1. 10.42
In [31]:
  1. df.describe()####整体一次性统计各项的指标基本统计特性
Out[31]:
 
 PassengerIdSurvivedPclassAgeSibSpParchFare
count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

原文链接:http://www.cnblogs.com/AI-robort/p/11636703.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号