NumPy聽vs聽Pandas

Need advice about which tool to choose?Ask the StackShare community!

NumPy

806
583
+ 1
6
Pandas

1.1K
919
+ 1
19
Add tool

NumPy vs Pandas: What are the differences?

Developers describe NumPy as "Fundamental package for scientific computing with Python". Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. On the other hand, Pandas is detailed as "High-performance, easy-to-use data structures and data analysis tools for the Python programming language". Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.

NumPy and Pandas can be primarily classified as "Data Science" tools.

Some of the features offered by NumPy are:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code

On the other hand, Pandas provides the following key features:

  • Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
  • Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
  • Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations

NumPy and Pandas are both open source tools. It seems that Pandas with 20K GitHub stars and 7.92K forks on GitHub has more adoption than NumPy with 10.9K GitHub stars and 3.64K GitHub forks.

Instacart, SendGrid, and Sighten are some of the popular companies that use Pandas, whereas NumPy is used by Instacart, SendGrid, and SweepSouth. Pandas has a broader approval, being mentioned in 73 company stacks & 46 developers stacks; compared to NumPy, which is listed in 62 company stacks and 32 developer stacks.

Decisions about NumPy and Pandas
Xi Huang
Developer at University of Toronto | 8 upvotes 路 45K views

For data analysis, we choose a Python-based framework because of Python's simplicity as well as its large community and available supporting tools. We choose PyTorch over TensorFlow for our machine learning library because it has a flatter learning curve and it is easy to debug, in addition to the fact that our team has some existing experience with PyTorch. Numpy is used for data processing because of its user-friendliness, efficiency, and integration with other tools we have chosen. Finally, we decide to include Anaconda in our dev process because of its simple setup process to provide sufficient data science environment for our purposes. The trained model then gets deployed to the back end as a pickle.

See more

ML Model Training and Benchmarking

We choose python for ML and data analysis. Because:

  • Simple syntax and easy to use
  • ML Library and framework support

The python libraries and frameworks we choose for ML are:

  1. TensorFlow
  • High performance (GPU support/ highly parallel)
  • Easy to debug
  • visualization support
  1. Numpy
  • Easy matrix manipulation
  • datatype with high compatibility
  1. Pandas
  • High efficiency when handling large data
  • Dataset manipulation and customization
  1. Matplotlib
  • Simple and easy to use
See more

A large part of our product is training and using a machine learning model. As such, we chose one of the best coding languages, Python, for machine learning. This coding language has many packages which help build and integrate ML models. For the main portion of the machine learning, we chose PyTorch as it is one of the highest quality ML packages for Python. PyTorch allows for extreme creativity with your models while not being too complex. Also, we chose to include scikit-learn as it contains many useful functions and models which can be quickly deployed. Scikit-learn is perfect for testing models, but it does not have as much flexibility as PyTorch. We also include NumPy and Pandas as these are wonderful Python packages for data manipulation. Also for testing models and depicting data, we have chosen to use Matplotlib and seaborn, a package which creates very good looking plots. Matplotlib is the standard for displaying data in Python and ML. Whereas, seaborn is a package built on top of Matplotlib which creates very visually pleasing plots.

See more
Get Advice from developers at your company using Private StackShare. Sign up for Private StackShare.
Learn More
Pros of NumPy
Pros of Pandas
  • 6
    Great for data analysis
  • 18
    Easy data frame management
  • 1
    Extensive file format compatibility

Sign up to add or upvote prosMake informed product decisions

Sign up to add or upvote consMake informed product decisions

What is NumPy?

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

What is Pandas?

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.

Need advice about which tool to choose?Ask the StackShare community!

Jobs that mention NumPy and Pandas as a desired skillset
What companies use NumPy?
What companies use Pandas?
See which teams inside your own company are using NumPy or Pandas.
Sign up for Private StackShareLearn More

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with NumPy?
What tools integrate with Pandas?

Sign up to get full access to all the tool integrationsMake informed product decisions

Blog Posts

+42
46
38951
+34
29
40301
What are some alternatives to NumPy and Pandas?
MATLAB
Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java.
R Language
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.
SciPy
Python-based ecosystem of open-source software for mathematics, science, and engineering. It contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.
Panda
Panda is a cloud-based platform that provides video and audio encoding infrastructure. It features lightning fast encoding, and broad support for a huge number of video and audio codecs. You can upload to Panda either from your own web application using our REST API, or by utilizing our easy to use web interface.<br>
TensorFlow
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.
See all alternatives
How developers use NumPy and Pandas
Vital Labs, Inc. uses
NumPy

We utilize NumPy, SciPy, Pandas, and iPython Notebooks to power our analysis and analytics tools.

Morris Clay uses
Pandas

Data wrangling, analysis and pre-processing

GadgetSteve uses
Pandas

Great data manipulation tool

GadgetSteve uses
NumPy

Fast Numeric Processing

Nough You uses
NumPy

Fast array operations.

BobStein uses
NumPy

big data analysis

Andrea Catalucci uses
NumPy

Number crunching