《Python数据分析 影印版》PDF下载

  • 购买积分:16 如何计算积分?
  • 作  者:Wes McKinney著
  • 出 版 社:南京:东南大学出版社
  • 出版年份:2018
  • ISBN:9787564175191
  • 页数:526 页
图书介绍:获得关于用Python语言操纵、处理、清洗和压缩数据集的完整介绍。这本容易上手的指南第二版为Python 3.6而升级,其中包括一些实用的案例研究,展示了如何有效解决各种数据分析问题。你将从中学到最新版pandas、NumPy、IPython和Jupyter的处理方法。本书由Python pandas项目的创立者Wes McKinney撰写,是一本实用、现代的Python数据科学工具读物。适合新入门Python的分析师和刚接触数据科学及科学计算的Python程序员。数据文件和相关材料在Github上可以获得。

1.Preliminaries 1

1.1 What Is This Book About? 1

What Kinds of Data? 1

1.2 Why Python for Data Analysis? 2

Python as Glue 2

Solving the “Two-Language” Problem 3

Why Not Python? 3

1.3 Essential Python Libraries 4

NumPy 4

pandas 4

matplotlib 5

IPython and Jupyter 6

Scipy 6

scikit-learn 7

statsmodels 8

1.4 Installation and Setup 8

Windows 9

Apple (OS X,macOS) 9

GNU/Linux 9

Installing or Updating Python Packages 10

Python 2 and Python 3 11

Integrated Development Environments (IDEs) and Text Editors 11

1.5 Community and Conferences 12

1.6 Navigating This Book 12

Code Examples 13

Data for Examples 13

Import Conventions 14

Jargon 14

2.Python Language Basics,IPython,and Jupyter Notebooks 15

2.1 The Python Interpreter 16

2.2 IPython Basics 17

Running the IPython Shell 17

Running the Jupyter Notebook 18

Tab Completion 21

Introspection 23

The %run Command 25

Executing Code from the Clipboard 26

Terminal Keyboard Shortcuts 27

About Magic Commands 28

Matplotlib Integration 29

2.3 Python Language Basics 30

Language Semantics 30

Scalar Types 38

Control Flow 46

3.Built-in Data Structures,Functions,and Files 51

3.1 Data Structures and Sequences 51

Tuple 51

List 54

Built-in Sequence Functions 59

dict 61

set 65

List,Set,and Dict Comprehensions 67

3.2 Functions 69

Namespaces,Scope,and Local Functions 70

Returning Multiple Values 71

Functions Are Objects 72

Anonymous (Lambda) Functions 73

Currying:Partial Argument Application 74

Generators 75

Errors and Exception Handling 77

3.3 Files and the Operating System 80

Bytes and Unicode with Files 83

3.4 Conclusion 84

4.NumPy Basics:Arrays and Vectorized Computation 85

4.1 The NumPy ndarray:A Multidimensional Array Object 87

Creating ndarrays 88

Data Types for ndarrays 90

Arithmetic with NumPy Arrays 93

Basic Indexing and Slicing 94

Boolean Indexing 99

Fancy Indexing 102

Transposing Arrays and Swapping Axes 103

4.2 Universal Functions:Fast Element-Wise Array Functions 105

4.3 Array-Oriented Programming with Arrays 108

Expressing Conditional Logic as Array Operations 109

Mathematical and Statistical Methods 111

Methods for Boolean Arrays 113

Sorting 113

Unique and Other Set Logic 114

4.4 File Input and Output with Arrays 115

4.5 Linear Algebra 116

4.6 Pseudorandom Number Generation 118

4.7 Example:Random Walks 119

Simulating Many Random Walks at Once 121

4.8 Conclusion 122

5.Getting Started with pandas 123

5.1 Introduction to pandas Data Structures 124

Series 124

DataFrame 128

Index Objects 134

5.2 Essential Functionality 136

Reindexing 136

Dropping Entries from an Axis 138

Indexing,Selection,and Filtering 140

Integer Indexes 145

Arithmetic and Data Alignment 146

Function Application and Mapping 151

Sorting and Ranking 153

Axis Indexes with Duplicate Labels 157

5.3 Summarizing and Computing Descriptive Statistics 158

Correlation and Covariance 160

Unique Values,Value Counts,and Membership 162

5.4 Conclusion 165

6.Data Loading,Storage,and File Formats 167

6.1 Reading and Writing Data in Text Format 167

Reading Text Files in Pieces 173

Writing Data to Text Format 175

Working with Delimited Formats 176

JSON Data 178

XML and HTML:Web Scraping 180

6.2 Binary Data Formats 183

Using HDF5 Format 184

Reading Microsoft Excel Files 186

6.3 Interacting with Web APIs 187

6.4 Interacting with Databases 188

6.5 Conclusion 190

7.Data Cleaning and Preparation 191

7.1 Handling Missing Data 191

Filtering Out Missing Data 193

Filling In Missing Data 195

7.2 Data Transformation 197

Removing Duplicates 197

Transforming Data Using a Function or Mapping 198

Replacing Values 200

Renaming Axis Indexes 201

Discretization and Binning 203

Detecting and Filtering Outliers 205

Permutation and Random Sampling 206

Computing Indicator/Dummy Variables 208

7.3 String Manipulation 211

String Object Methods 211

Regular Expressions 213

Vectorized String Functions in pandas 216

7.4 Conclusion 219

8.Data Wrangling:Join,Combine,and Reshape 221

8.1 Hierarchical Indexing 221

Reordering and Sorting Levels 224

Summary Statistics by Level 225

Indexing with a DataFrame’s columns 225

8.2 Combining and Merging Datasets 227

Database-Style DataFrame Joins 227

Merging on Index 232

Concatenating Along an Axis 236

Combining Data with Overlap 241

8.3 Reshaping and Pivoting 242

Reshaping with Hierarchical Indexing 243

Pivoting “Long” to “Wide” Format 246

Pivoting “Wide” to “Long” Format 249

8.4 Conclusion 251

9.Plotting and Visualization 253

9.1 A Brief matplotlib API Primer 253

Figures and Subplots 255

Colors,Markers,and Line Styles 259

Ticks,Labels,and Legends 261

Annotations and Drawing on a Subplot 265

Saving Plots to File 267

matplotlib Configuration 268

9.2 Plotting with pandas and seaborn 268

Line Plots 269

Bar Plots 272

Histograms and Density Plots 277

Scatter or Point Plots 280

Facet Grids and Categorical Data 283

9.3 Other Python Visualization Tools 285

9.4 Conclusion 286

10.Data Aggregation and Group Operations 287

10.1 GroupBy Mechanics 288

Iterating Over Groups 291

Selecting a Column or Subset of Columns 293

Grouping with Dicts and Series 294

Grouping with Functions 295

Grouping by Index Levels 295

10.2 Data Aggregation 296

Column-Wise and Multiple Function Application 298

Returning Aggregated Data Without Row Indexes 301

10.3 Apply:General split-apply-combine 302

Suppressing the Group Keys 304

Quantile and Bucket Analysis 305

Example:Filling Missing Values with Group-Specific Values 306

Example:Random Sampling and Permutation 308

Example:Group Weighted Average and Correlation 310

Example:Group-Wise Linear Regression 312

10.4 Pivot Tables and Cross-Tabulation 313

Cross-Tabulations:Crosstab 315

10.5 Conclusion 316

11.Time Series 317

11.1 Date and Time Data Types and Tools 318

Converting Between String and Datetime 319

11.2 Time Series Basics 322

Indexing,Selection,Subsetting 323

Time Series with Duplicate Indices 326

11.3 Date Ranges,Frequencies,and Shifting 327

Generating Date Ranges 328

Frequencies and Date Offsets 330

Shifting (Leading and Lagging) Data 332

11.4 Time Zone Handling 335

Time Zone Localization and Conversion 335

Operations with Time Zone-Aware Timestamp Objects 338

Operations Between Different Time Zones 339

11.5 Periods and Period Arithmetic 339

Period Frequency Conversion 340

Quarterly Period Frequencies 342

Converting Timestamps to Periods (and Back) 344

Creating a PeriodIndex from Arrays 345

11.6 Resampling and Frequency Conversion 348

Downsampling 349

Upsampling and Interpolation 352

Resampling with Periods 353

11.7 Moving Window Functions 354

Exponentially Weighted Functions 358

Binary Moving Window Functions 359

User-Defined Moving Window Functions 361

11.8 Conclusion 362

12.Advanced pandas 363

12.1 Categorical Data 363

Background and Motivation 363

Categorical Type in pandas 365

Computations with Categoricals 367

Categorical Methods 370

12.2 Advanced GroupBy Use 373

Group Transforms and “Unwrapped” GroupBys 373

Grouped Time Resampling 377

12.3 Techniques for Method Chaining 378

The pipe Method 380

12.4 Conclusion 381

13.Introduction to Modeling Libraries in Python 383

13.1 Interfacing Between pandas and Model Code 383

13.2 Creating Model Descriptions with Patsy 386

Data Transformations in Patsy Formulas 389

Categorical Data and Patsy 390

13.3 Introduction to statsmodels 393

Estimating Linear Models 393

Estimating Time Series Processes 396

13.4 Introduction to scikit-learn 397

13.5 Continuing Your Education 401

14.Data Analysis Examples 403

14.1 1.USA.gov Data from Bitly 403

Counting Time Zones in Pure Python 404

Counting Time Zones with pandas 406

14.2 MovieLens 1M Dataset 413

Measuring Rating Disagreement 418

14.3 US Baby Names 1880-2010 419

Analyzing Naming Trends 425

14.4 USDA Food Database 434

14.5 2012 Federal Election Commission Database 440

Donation Statistics by Occupation and Employer 442

Bucketing Donation Amounts 445

Donation Statistics by State 447

14.6 Conclusion 448

A.Advanced NumPy 449

A.1 ndarray Object Internals 449

NumPy dtype Hierarchy 450

A.2 Advanced Array Manipulation 451

Reshaping Arrays 452

C Versus Fortran Order 454

Concatenating and Splitting Arrays 454

Repeating Elements:tile and repeat 457

Fancy Indexing Equivalents:take and put 459

A.3 Broadcasting 460

Broadcasting Over Other Axes 462

Setting Array Values by Broadcasting 465

A.4 Advanced ufunc Usage 466

ufunc Instance Methods 466

Writing New ufuncs in Python 468

A.5 Structured and Record Arrays 469

Nested dtypes and Multidimensional Fields 469

Why Use Structured Arrays? 470

A.6 More About Sorting 471

Indirect Sorts:argsort and lexsort 472

Alternative Sort Algorithms 474

Partially Sorting Arrays 474

numpy.searchsorted:Finding Elements in a Sorted Array 475

A.7 Writing Fast NumPy Functions with Numba 476

Creating Custom numpy.ufunc Objects with Numba 478

A.8 Advanced Array Input and Output 478

Memory-Mapped Files 478

HDF5 and Other Array Storage Options 480

A.9 Performance Tips 480

The Importance of Contiguous Memory 480

B.More on the IPython System 483

B.1 Using the Command History 483

Searching and Reusing the Command History 483

Input and Output Variables 484

B.2 Interacting with the Operating System 485

Shell Commands and Aliases 486

Directory Bookmark System 487

B.3 Software Development Tools 487

Interactive Debugger 488

Timing Code:%time and %timeit 492

Basic Profiling:0016D2C0run and %run-P 494

Profiling a Function Line by Line 496

B.4 Tips for Productive Code Development Using IPython 498

Reloading Module Dependencies 498

Code Design Tips 499

B.5 Advanced IPython Features 500

Making Your Own Classes IPython-Friendly 500

Profiles and Configuration 501

B.6 Conclusion 503

Index 505