Dataproject Analysis of surface tempertaure of earth¶

Source For Data set : https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
The raw data comes from the Berkeley Earth data page.

GlobalLandTemperature in Country¶

1 to know the avg temperature for different countries by a period of [25years]
2.which country has highest mean temperature till last
3.which country has lowest mean temperature till last
4.In a country to know the mean temperatures by months (mean of months in every year)

Global Average Land Temperature by Cities¶

1 to know the avg temperature for different states by a period of [25years]
2.which city has highest mean temperature
3.which city has lowest mean temperature
4.In a city to know the mean temperatures by months (mean of months in every year)
5.the variation in temperatures by longitudes

In [1]:
import pandas as pd
import numpy as np
import datetime as date
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')

Part 1¶

In [2]:
#Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)
avglantempcon=pd.read_csv(r"C:\Users\Umesh Potha\Desktop\climatedataglobal\GlobalLandTemperaturesByCountry.csv")
In [3]:
avglantempcon.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 577462 entries, 0 to 577461
Data columns (total 4 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   dt                             577462 non-null  object 
 1   AverageTemperature             544811 non-null  float64
 2   AverageTemperatureUncertainty  545550 non-null  float64
 3   Country                        577462 non-null  object 
dtypes: float64(2), object(2)
memory usage: 17.6+ MB
In [4]:
avglantempcon.head()
Out[4]:
dt AverageTemperature AverageTemperatureUncertainty Country
0 1743-11-01 4.384 2.294 Åland
1 1743-12-01 NaN NaN Åland
2 1744-01-01 NaN NaN Åland
3 1744-02-01 NaN NaN Åland
4 1744-03-01 NaN NaN Åland

for easy in through time series we convert the strings of dt column to datetime time type using the inbuilt to _datetime method()

In [5]:
avglantempcon['dt']=pd.to_datetime(avglantempcon['dt'])

Since all the data in other datases are not from the same date lets drop the all rows upto 1855

In [6]:
avglantempcon=avglantempcon[avglantempcon['dt']>='1850-01-01']
In [7]:
avglantempcon.head()
Out[7]:
dt AverageTemperature AverageTemperatureUncertainty Country
1274 1850-01-01 -9.083 1.834 Åland
1275 1850-02-01 -2.309 1.603 Åland
1276 1850-03-01 -4.801 3.033 Åland
1277 1850-04-01 1.242 2.008 Åland
1278 1850-05-01 7.920 0.881 Åland

we skip the null values in our dataset as droping them may lead time series breakdown

lets find avg temp of whole country by years

In [8]:
ser1=avglantempcon['Country'].unique()
In [9]:
avglantempcon['year'] = avglantempcon['dt'].dt.year
In [10]:
avglantempcon['month'] = avglantempcon['dt'].dt.month
In [11]:
grouped=avglantempcon.groupby('year')['AverageTemperature']
In [12]:
avgmean=grouped.mean()
In [13]:
plt.ylabel('meantemperatures')
plt.xlabel('years')
plt.title('years - meantemperatures')
plt.plot(avgmean,color='green')
Out[13]:
[<matplotlib.lines.Line2D at 0x169da9d8df0>]
No description has been provided for this image
In [14]:
grouped1=avglantempcon.groupby('Country')['AverageTemperature']
In [15]:
avgtempcon=grouped1.mean()
In [16]:
lst=list(avgtempcon.index)
In [17]:
plt.figure(figsize=(45,10))
plt.bar(lst[0:100],avgtempcon[0:100],color='purple',width=1)
_=plt.xticks(rotation=90,fontsize=20)
_=plt.yticks(fontsize=20)
_=plt.xlabel('Countries',fontsize=30)
_=plt.ylabel('Mean Temperatures',fontsize=30)
_=plt.title('Countries - Mean Temperature[first 100 countries]',fontsize=30)
No description has been provided for this image
In [18]:
plt.figure(figsize=(45,10))
plt.bar(lst[100:200],avgtempcon[100:200],color='purple',width=1)
_=plt.xticks(rotation=90,fontsize=20)
_=plt.yticks(fontsize=20)
_=plt.xlabel('Countries',fontsize=30)
_=plt.ylabel('Mean Temperatures',fontsize=30)
_=plt.title('Countries - Mean Temperature[Next 100 countries]',fontsize=30)
No description has been provided for this image
In [19]:
grouped2 = avglantempcon.groupby(['year', 'Country'])['AverageTemperature']
In [20]:
avg_mean_by_year_country =grouped2.mean()
In [21]:
dataset=avg_mean_by_year_country.unstack()
In [22]:
dataset
Out[22]:
Country Afghanistan Africa Albania Algeria American Samoa Andorra Angola Anguilla Antarctica Antigua And Barbuda ... Uruguay Uzbekistan Venezuela Vietnam Virgin Islands Western Sahara Yemen Zambia Zimbabwe Åland
year
1850 13.326083 23.672273 11.734667 22.587333 NaN 10.651750 NaN 26.106333 NaN 25.933250 ... NaN 11.618333 24.539167 23.198417 25.855667 22.182250 NaN 20.423182 20.154364 4.648667
1851 13.605667 NaN 12.315500 22.733333 NaN 10.297083 NaN 26.261250 NaN 26.060917 ... 16.840417 12.042750 24.605250 23.352583 26.006333 22.487000 NaN NaN NaN 5.306333
1852 13.541167 NaN 12.744583 22.856583 NaN 11.503750 NaN 26.026000 NaN 25.855167 ... 16.806167 11.875833 24.491583 23.219333 25.738167 22.128000 NaN NaN NaN 4.995667
1853 13.455833 NaN 12.811167 22.770583 NaN 10.111333 NaN 26.241500 NaN 26.076583 ... 16.780000 11.667750 24.623083 23.504250 25.973833 21.986333 NaN NaN NaN 4.907750
1854 13.605750 NaN 11.951667 22.481167 NaN 10.682417 NaN 26.165333 NaN 25.983583 ... 16.951833 12.074583 24.572583 23.497250 25.890167 22.003667 NaN NaN NaN 5.626667
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2009 15.257750 25.026500 13.844250 24.154333 27.034250 12.566667 22.316500 27.468583 NaN 27.277333 ... 17.871333 13.700333 26.084917 24.465583 27.238500 23.381083 27.342417 21.670250 21.377250 6.489083
2010 15.828667 25.472500 13.775417 25.215667 27.453417 11.480833 22.681500 27.856000 NaN 27.735417 ... 17.920083 14.325917 26.150250 24.833333 27.593667 24.114250 27.302750 22.267500 21.986250 4.861917
2011 15.518000 24.786500 13.443250 24.144167 27.009500 12.994417 22.029667 27.528333 NaN 27.296167 ... 17.824583 13.141083 25.677333 23.692583 27.159250 23.401250 27.288250 21.771583 21.602417 7.170750
2012 14.481583 24.725917 13.768250 23.954833 27.201417 12.339917 22.123333 27.639250 NaN 27.433500 ... 18.509000 13.144167 25.688583 24.704333 27.360167 23.303417 27.445000 21.697750 21.521333 6.063917
2013 16.533625 25.208750 14.993875 25.121500 27.517250 12.307875 22.507875 27.363000 NaN 27.249625 ... 16.754375 16.188250 25.912875 25.232125 27.312333 23.744250 28.129750 21.196000 20.710750 6.229750

164 rows × 243 columns

In [23]:
for x in dataset.columns[0:10]:
    plt.plot(dataset.index,dataset[x],label=x)
plt.title('MeanTemperatures - Years')
plt.ylabel('MeanTemperatures')
plt.xlabel('Years')
plt.legend(loc='best',title="Countries")    
Out[23]:
<matplotlib.legend.Legend at 0x169de7a1cd0>
No description has been provided for this image
In [24]:
plt.title('MeanTemperatures - Years')
plt.ylabel('MeanTemperatures')
plt.xlabel('Years')
plt.legend(loc='best') 
for x in dataset.columns[51:61]:
    plt.plot(dataset.index,dataset[x],label=x)
plt.legend(loc='best',title='Countries')    
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
Out[24]:
<matplotlib.legend.Legend at 0x169dfae5d90>
No description has been provided for this image
In [25]:
month_names = {
    1: 'January',
    2: 'February',
    3: 'March',
    4: 'April',
    5: 'May',
    6: 'June',
    7: 'July',
    8: 'August',
    9: 'September',
    10: 'October',
    11: 'November',
    12: 'December'
}

avglantempcon['month'] = avglantempcon['month'].map(month_names)
In [26]:
grouped3=avglantempcon.groupby('month')['AverageTemperature']
avgmeanmonth=grouped3.mean()
In [27]:
avgmeanmonth=avgmeanmonth.reindex(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])
In [28]:
plt.title('Months - Mean Temperatures')
plt.xlabel('Months')
plt.ylabel('Mean Temperature')
_=plt.xticks(rotation=90)
plt.plot(avgmeanmonth,color='red')
plt.show()
No description has been provided for this image
In [29]:
grouped4 = avglantempcon.groupby(['month', 'Country'])['AverageTemperature']
In [30]:
avgmeantempmon=grouped4.mean()
In [31]:
dataset2=avgmeantempmon.unstack()
In [32]:
dataset2=dataset2.reindex(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])
In [33]:
plt.xlabel('Months')
plt.ylabel('Temperatures')
plt.title('Months Wise Mean of temperatures of Different Countries')
for x in dataset2.columns[0:5]:
    plt.plot(dataset2.index,dataset2[x],label=x)
_=plt.xticks(rotation=45)    
plt.legend()
Out[33]:
<matplotlib.legend.Legend at 0x169dd77cdc0>
No description has been provided for this image

Part 2¶

In [34]:
#to know the avg temperature for different states by a period of [25years]
In [35]:
climate=pd.read_csv(r"C:\Users\Umesh Potha\Desktop\climatedataglobal\GlobalLandTemperaturesByMajorCity.csv")
In [36]:
city=(climate[climate['City'].isin(['Delhi','Bombay','Madras','Bangalore','Calcutta'])])
city.head()
Out[36]:
dt AverageTemperature AverageTemperatureUncertainty City Country Latitude Longitude
17335 1796-01-01 22.672 2.317 Bangalore India 12.05N 77.26E
17336 1796-02-01 24.420 1.419 Bangalore India 12.05N 77.26E
17337 1796-03-01 26.092 2.459 Bangalore India 12.05N 77.26E
17338 1796-04-01 27.687 1.746 Bangalore India 12.05N 77.26E
17339 1796-05-01 27.619 1.277 Bangalore India 12.05N 77.26E
In [37]:
city=city[city['dt']>'1988-01-01']
In [38]:
city['dt']=pd.to_datetime(city['dt'])
In [39]:
city['year']=city['dt'].dt.year
In [40]:
group1=city.groupby(['year','City'])['AverageTemperature']
In [41]:
data=group1.mean().unstack()
data.head()
Out[41]:
City Bangalore Bombay Calcutta Delhi Madras
year
1988 25.717909 27.477636 27.550364 27.020182 29.414182
1989 25.198417 26.913083 26.494583 25.432583 28.668583
1990 25.254833 26.975250 26.362417 25.538250 28.764833
1991 25.487083 26.961167 26.541833 25.738667 28.833583
1992 25.159250 27.074417 26.412667 25.521083 28.660833
In [42]:
plt.xlabel('Years')
plt.ylabel('mean Temperature')
plt.title('Years-temperatures')
for i in ['Delhi','Bombay','Madras','Bangalore','Calcutta']:
    plt.plot(data[i],label=i,linestyle='--')
plt.legend(loc='upper left',title='Cities')
Out[42]:
<matplotlib.legend.Legend at 0x169de2cdd00>
No description has been provided for this image
In [43]:
filtered_df = climate[climate['Country'].isin(['India'])]
In [44]:
city_mean_temperatures = filtered_df.groupby('City')['AverageTemperature'].mean().reset_index()

highest_temp_city = city_mean_temperatures.loc[city_mean_temperatures['AverageTemperature'].idxmax()]
lowest_temp_city = city_mean_temperatures.loc[city_mean_temperatures['AverageTemperature'].idxmin()]
city_mean_temperatures
Out[44]:
City AverageTemperature
0 Ahmadabad 26.529853
1 Bangalore 24.855896
2 Bombay 26.631452
3 Calcutta 26.042152
4 Delhi 25.165861
5 Hyderabad 26.869335
6 Jaipur 25.393058
7 Kanpur 24.760041
8 Lakhnau 24.760041
9 Madras 28.417858
10 Nagpur 25.655016
11 New Delhi 25.165861
12 Pune 24.644615
13 Surat 26.330856
In [45]:
plt.figure(figsize=(10, 6))
plt.bar(city_mean_temperatures['City'], city_mean_temperatures['AverageTemperature'])
plt.xlabel('Cities')
plt.ylabel('Mean Temperature (°C)')
plt.title('Mean Temperature in Different Cities')
plt.xticks(rotation=45)
plt.show()

print(f"The city with the lowest mean temperature among the specified cities is {highest_temp_city['City']} with an average temperature of {highest_temp_city['AverageTemperature']:.2f} °C.")
print(f"The city with the lowest mean temperature among the specified cities is {lowest_temp_city['City']} with an average temperature of {lowest_temp_city['AverageTemperature']:.2f} °C.")
No description has been provided for this image
The city with the lowest mean temperature among the specified cities is Madras with an average temperature of 28.42 °C.
The city with the lowest mean temperature among the specified cities is Pune with an average temperature of 24.64 °C.
In [46]:
climate['dt']=pd.to_datetime(climate['dt'])
climate['month'] = climate['dt'].dt.month
month_names = {
    1: 'January',
    2: 'February',
    3: 'March',
    4: 'April',
    5: 'May',
    6: 'June',
    7: 'July',
    8: 'August',
    9: 'September',
    10: 'October',
    11: 'November',
    12: 'December'
}

climate['month'] = climate['month'].map(month_names)
In [47]:
climate
Out[47]:
dt AverageTemperature AverageTemperatureUncertainty City Country Latitude Longitude month
0 1849-01-01 26.704 1.435 Abidjan Côte D'Ivoire 5.63N 3.23W January
1 1849-02-01 27.434 1.362 Abidjan Côte D'Ivoire 5.63N 3.23W February
2 1849-03-01 28.101 1.612 Abidjan Côte D'Ivoire 5.63N 3.23W March
3 1849-04-01 26.140 1.387 Abidjan Côte D'Ivoire 5.63N 3.23W April
4 1849-05-01 25.427 1.200 Abidjan Côte D'Ivoire 5.63N 3.23W May
... ... ... ... ... ... ... ... ...
239172 2013-05-01 18.979 0.807 Xian China 34.56N 108.97E May
239173 2013-06-01 23.522 0.647 Xian China 34.56N 108.97E June
239174 2013-07-01 25.251 1.042 Xian China 34.56N 108.97E July
239175 2013-08-01 24.528 0.840 Xian China 34.56N 108.97E August
239176 2013-09-01 NaN NaN Xian China 34.56N 108.97E September

239177 rows × 8 columns

In [48]:
filteredf1 = climate[climate['Country'].isin(['India'])]
In [49]:
grouped1 = filteredf1.groupby(['City','month'])['AverageTemperature']
avgmeantempmon=grouped1.mean()
dataset3=avgmeantempmon.unstack()
dataset3=dataset3.reindex(columns=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])
In [50]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.title('Monthly Temperature in all Indian cities')
plt.xlabel('Month')
plt.ylabel('Temperature (°C)')
plt.xticks(rotation=45)
plt.grid(True)

cities = ["Ahmadabad", "Bangalore", "Bombay", "Calcutta", "Delhi", "Hyderabad", "Jaipur"]
colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k']

for city, color in zip(cities, colors):
    plt.plot(dataset3.loc[city], marker='o', linestyle='-', label=city, color=color)

plt.legend(loc='lower center',title='Cities')
plt.show()
No description has been provided for this image
In [51]:
plt.figure(figsize=(10, 6))
plt.title('Monthly Temperature in all Indian cities')
plt.xlabel('Month')
plt.ylabel('Temperature (°C)')
plt.xticks(rotation=45)
plt.grid(True)

colors = ['red', 'orange', 'purple', 'brown', 'pink', 'gray', 'olive']
cities = ["Kanpur", "Lakhnau", "Madras", "Nagpur", "New Delhi", "Pune", "Surat"]

for city, color in zip(cities, colors):
    plt.plot(dataset3.loc[city], marker='o', linestyle='-', label=city, color=color)
    

plt.legend(loc='lower center',title='Cities')
plt.show()
No description has been provided for this image
In [52]:
grouped5 =filteredf1["AverageTemperature"].groupby(filteredf1["Latitude"])
avgmeantemp=grouped5.mean()
In [53]:
plt.ylabel('AverageTempreature')
plt.xlabel('Latitudes')
plt.title('Latitudes - AverageTemperatures')
plt.plot(avgmeantemp)
Out[53]:
[<matplotlib.lines.Line2D at 0x169db821100>]
No description has been provided for this image

There are many factors for surface temperatures so we cannot define just by latitudes so there is an irregularity in the plot which is observed

In [ ]: