Panda python

Panda is a part of python 

pandas code in python


from pandas import Series, DataFrame

import pandas as pd

we will need to get comfortable with its two workhorse data structures: Series and DataFrame. While they are not a universal solution for every problem, they provide a solid, easy-to-use basis for most applications.** A Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index. The simplest Series is formed from only an array of data:
obj = Series([47-53])
obj
0 4 1 7 2 -5 3 3 dtype: int64
obj.values
array([ 4, 7, -5, 3])
obj.index
RangeIndex(start=0, stop=4, step=1)
obj2=Series([4,7,-5,3],index=['d','b','a','c'])
obj2
d 4 b 7 a -5 c 3 dtype: int64
obj2.index
Index(['d', 'b', 'a', 'c'], dtype='object')
obj2.values
array([ 4, 7, -5, 3])
obj2[obj2>0]
d 4 b 7 c 3 dtype: int64
obj2*3
d 12 b 21 a -15 c 9 dtype: int64
import numpy as np
np.exp(obj2)
d 54.598150 b 1096.633158 a 0.006738 c 20.085537 dtype: float64
'b' in obj2
True
'e' in obj2
False
we can create a Series from it by passing the dict:
sdata = {'Ohio'35000'Texas'71000'Oregon'16000'Utah'5000}
pdata={'Rice'1500,'Weat':1800,' Suger':4000}
obj3=Series(sdata)
obj3
Ohio 35000 Texas 71000 Oregon 16000 Utah 5000 dtype: int64
states = ['California''Ohio''Oregon''Texas']
obj4=Series(sdata,index=states)
obj4
California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64
I will use the terms “missing” or “NA” to refer to missing data. The isnull and notnull functions in pandas should be used to detect missing data:
pd.isnull(obj4) 
California True Ohio False Oregon False Texas False dtype: bool
pd.notnull(obj4) 
California False Ohio True Oregon True Texas True dtype: bool
obj4.isnull()
California True Ohio False Oregon False Texas False dtype: bool
obj3
Ohio 35000 Texas 71000 Oregon 16000 Utah 5000 dtype: int64
obj4
California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64
obj3+obj4
California NaN Ohio 70000.0 Oregon 32000.0 Texas 142000.0 Utah NaN dtype: float64
obj4.name = 'population'
obj4.index.name = 'state'
A Series’s index can be altered in place by assignment:
obj.index = ['Bob''Steve''Jeff''Ryan']
obj.index
Index(['Bob', 'Steve', 'Jeff', 'Ryan'], dtype='object')
obj
Bob 4 Steve 7 Jeff -5 Ryan 3 dtype: int64
# A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.). The DataFrame has both a row and column index; it can be thought of as a dict of Series (one for all sharing the same index). There are numerous ways to construct a DataFrame, though one of the most common is from a dict of equal-length lists or NumPy arrays
data = {'state': ['Ohio''Ohio''Ohio''Nevada''Nevada'],
 'year': [20002001200220012002],
 'pop': [1.51.73.62.42.9]}
data
{'pop': [1.5, 1.7, 3.6, 2.4, 2.9], 'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'year': [2000, 2001, 2002, 2001, 2002]}
The resulting DataFrame will have its index assigned automatically as with Series, and the columns are placed in sorted order:
frame=DataFrame(data)
 DataFrame(data, columns=['year''state''pop'])
yearstatepop
02000Ohio1.5
12001Ohio1.7
22002Ohio3.6
32001Nevada2.4
42002Nevada2.9
DataFrame(data,columns=['year','pop','state'])
yearpopstate
020001.5Ohio
120011.7Ohio
220023.6Ohio
320012.4Nevada
420022.9Nevada
frame2 = DataFrame(data, columns=['year''state''pop''debt'],
                   index=['one''two''three''four''five'])
frame2
yearstatepopdebt
one2000Ohio1.5NaN
two2001Ohio1.7NaN
three2002Ohio3.6NaN
four2001Nevada2.4NaN
five2002Nevada2.9NaN
frame2.columns
Index(['year', 'state', 'pop', 'debt'], dtype='object')
frame2['state']
one Ohio two Ohio three Ohio four Nevada five Nevada Name: state, dtype: object
frame2.year
one 2000 two 2001 three 2002 four 2001 five 2002 Name: year, dtype: int64
frame2['debt'] = 16.5

 frame2
yearstatepopdebt
one2000Ohio1.516.5
two2001Ohio1.716.5
three2002Ohio3.616.5
four2001Nevada2.416.5
five2002Nevada2.916.5
val = Series([-1.2-1.5-1.7], index=['two''four''five'])
val
two -1.2 four -1.5 five -1.7 dtype: float64
frame2['debt'] = val
Another common form of data is a nested dict of dicts format: If passed to DataFrame, it will interpret the outer dict keys as the columns and the inner keys as the row indices:
pop = {'Nevada': {20012.420022.9},
 'Ohio': {20001.520011.720023.6}}
pop
{'Nevada': {2001: 2.4, 2002: 2.9}, 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = DataFrame(pop)
frame3
NevadaOhio
20012.41.7
20022.93.6
2000NaN1.5
 frame3.T
200120022000
Nevada2.42.9NaN
Ohio1.73.61.5
 DataFrame(pop, index=[200120022003])
NevadaOhio
20012.41.7
20022.93.6
2003NaNNaN
pdata = {'Ohio': frame3['Ohio'][:-1],
 'Nevada': frame3['Nevada'][:2]}
pdata
{'Nevada': 2001 2.4 2002 2.9 Name: Nevada, dtype: float64, 'Ohio': 2001 1.7 2002 3.6 Name: Ohio, dtype: float64}
DataFrame(pdata)
OhioNevada
20011.72.4
20023.62.9
Reindexing A critical method on pandas objects is reindex, which means to create a new object with the data conformed to a new index
obj = Series([4.57.2-5.33.6], index=['d''b''a''c'])

obj
d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64
obj2 = obj.reindex(['a''b''c''d''e'])
obj2

a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64
obj.reindex(['a''b''c''d''e'], fill_value=0)
a -5.3 b 7.2 c 3.6 d 4.5 e 0.0 dtype: float64

For more panda's code










Post a Comment

0 Comments