Panda python

Panda is a part of python 

pandas code in python


from pandas import Series, DataFrame

import pandas as pd

we will need to get comfortable with its two workhorse data structures: Series and DataFrame. While they are not a universal solution for every problem, they provide a solid, easy-to-use basis for most applications.** A Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index. The simplest Series is formed from only an array of data:
obj = Series([47-53])
obj
0 4 1 7 2 -5 3 3 dtype: int64
obj.values
array([ 4, 7, -5, 3])
obj.index
RangeIndex(start=0, stop=4, step=1)
obj2=Series([4,7,-5,3],index=['d','b',
'a','c'])
obj2
d 4 b 7 a -5 c 3 dtype: int64
obj2.index
Index(['d', 'b', 'a', 'c'], dtype='object')
obj2.values
array([ 4, 7, -5, 3])
obj2[obj2>0]
d 4 b 7 c 3 dtype: int64
obj2*3
d 12 b 21 a -15 c 9 dtype: int64
import numpy as np
np.exp(obj2)
d 54.598150 b 1096.633158 a 0.006738 c 20.085537 dtype: float64
'b' in obj2
True
'e' in obj2
False
we can create a Series from it by passing the dict:
sdata = {'Ohio'35000'Texas'71000,
 'Oregon'16000'Utah'5000}
pdata={'Rice'1500,'Weat':1800,' Suger'
:4000}
obj3=Series(sdata)
obj3
Ohio 35000 Texas 71000 Oregon 16000 Utah 5000 dtype: int64
states = ['California''Ohio''Oregon',
 'Texas']
obj4=Series(sdata,index=states)
obj4
California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64
I will use the terms “missing” or “NA” to refer to missing data. The isnull and notnull functions in pandas should be used to detect missing data:
pd.isnull(obj4) 
California True Ohio False Oregon False Texas False dtype: bool
pd.notnull(obj4) 
California False Ohio True Oregon True Texas True dtype: bool
obj4.isnull()
California True Ohio False Oregon False Texas False dtype: bool
obj3
Ohio 35000 Texas 71000 Oregon 16000 Utah 5000 dtype: int64
obj4
California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64
obj3+obj4
California NaN Ohio 70000.0 Oregon 32000.0 Texas 142000.0 Utah NaN dtype: float64
obj4.name = 'population'
obj4.index.name = 'state'
A Series’s index can be altered in place by assignment:
obj.index = ['Bob''Steve''Jeff',
 'Ryan']
obj.index
Index(['Bob', 'Steve', 'Jeff', 'Ryan'],
dtype='object')
obj
Bob 4 Steve 7 Jeff -5 Ryan 3 dtype: int64
# A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.). The DataFrame has both a row and column index; it can be thought of as a dict of Series (one for all sharing the same index). There are numerous ways to construct a DataFrame, though one of the most common is from a dict of equal-length lists or NumPy arrays
data = {'state': ['Ohio''Ohio''Ohio''Nevada''Nevada'],
 'year': [20002001200220012002],
 'pop': [1.51.73.62.42.9]}
data
{'pop': [1.5, 1.7, 3.6, 2.4, 2.9], 'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'year': [2000, 2001, 2002, 2001, 2002]}
The resulting DataFrame will have its index assigned automatically as with Series, and the columns are placed in sorted order:
frame=DataFrame(data)
 DataFrame(data, columns=['year',
 'state''pop'])
yearstatepop
02000Ohio1.5
12001Ohio1.7
22002Ohio3.6
32001Nevada2.4
42002Nevada2.9
DataFrame(data,columns=['year','pop',
state'])
yearpopstate
020001.5Ohio
120011.7Ohio
220023.6Ohio
320012.4Nevada
420022.9Nevada
frame2 = DataFrame(data, columns=['year',
 'state''pop''debt'],
     index=['one''two',
 'three''four''five'])
frame2
yearstatepopdebt
one2000Ohio1.5NaN
two2001Ohio1.7NaN
three2002Ohio3.6NaN
four2001Nevada2.4NaN
five2002Nevada2.9NaN
frame2.columns
Index(['year', 'state', 'pop', 'debt'],
dtype='object')
frame2['state']
one Ohio two Ohio three Ohio four Nevada five Nevada Name: state, dtype: object
frame2.year
one 2000 two 2001 three 2002 four 2001 five 2002 Name: year, dtype: int64
frame2['debt'] = 16.5

 frame2
yearstatepopdebt
one2000Ohio1.516.5
two2001Ohio1.716.5
three2002Ohio3.616.5
four2001Nevada2.416.5
five2002Nevada2.916.5
val = Series([-1.2-1.5-1.7],
 index=['two''four''five'])
val
two -1.2 four -1.5 five -1.7 dtype: float64
frame2['debt'] = val
Another common form of data is a nested dict of dicts format: If passed to DataFrame, it will interpret the outer dict keys as the columns and the inner keys as the row indices:
pop = {'Nevada': {20012.420022.9},
 'Ohio': {20001.520011.720023.6}}
pop
{'Nevada': {2001: 2.4, 2002: 2.9},
'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = DataFrame(pop)
frame3
NevadaOhio
20012.41.7
20022.93.6
2000NaN1.5
 frame3.T
200120022000
Nevada2.42.9NaN
Ohio1.73.61.5
 DataFrame(pop, index=[200120022003])
NevadaOhio
20012.41.7
20022.93.6
2003NaNNaN
pdata = {'Ohio': frame3['Ohio'][:-1],
 'Nevada': frame3['Nevada'][:2]}
pdata
{'Nevada': 2001 2.4 2002 2.9 Name: Nevada, dtype: float64, 'Ohio': 2001 1.7 2002 3.6 Name: Ohio, dtype: float64}
DataFrame(pdata)
OhioNevada
20011.72.4
20023.62.9
Reindexing A critical method on pandas objects is reindex, which means to create a new object with the data conformed to a new index
obj = Series([4.57.2-5.33.6],
 index=['d''b''a''c'])

obj
d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64
obj2 = obj.reindex(['a''b''c''d''e'])
obj2

a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64
obj.reindex(['a''b''c''d''e'], fill_value=0)
a -5.3 b 7.2 c 3.6 d 4.5 e 0.0 dtype: float64

Data sources and pandas methods

Data sources for a data science project can be divided into the following categories:

Databases: Most CRM, ERP, and other enterprise archive tools on the website. Depending on the volume, speed, and variability, a traditional or NoSQL database. To connect with many popular details, we need JDBC / ODBC drivers from Python. Fortunately, there are drivers like that is available on all popular databases. Data processing is such a website including making a connection with Python to these sources, asking questions aboutdata via Python, and then trick it down using pandas. We will look at an example of how to do this later in this chapter.

Web Services: Many business application tools, especially Software such as Service tools (SaaS), making their data accessible through the App Programming Interfaces (APIs) instead of a website. This reduces the cost of permanent website hosting infrastructure. Instead, data is generated is available as a service, if required. An API call can be made Python, which returns data packets in formats such as JSON or XML. And data it is processed and used using pandas for continuous use.

Data files: Most data prototyping data science models come as data files.One example of data stored as a portable file is data from IoT sensors in most cases, data from these sensors is stored in a flat file, a .txt file, or .csv file. Another source of data file is a sample of existing data extracted from the website and stored in such files. Excessive data extraction machine science and learning algorithms are also stored in such files, as CSV, Excel, and .txt files. Another example is that weighted matrices are trained of the neural network model for deep learning can be saved as an HDF file.

Web and document scratches: Two other sources of data tables and text available on web pages. This data is collected on these pages using Python packages like BeautifulSoup and Scrapy and are included in the data file or database to be used continuously. Tables and data available in another nondata format file, such as PDF or Documents, are also a major source of data. This, then is released using Python packages such as Tesseract and Tabula-py.

For more panda's code










Post a Comment

0 Comments