Dataproject Analysis of surface tempertaure of earth¶
Source For Data set : https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
The raw data comes from the Berkeley Earth data page.
GlobalLandTemperature in Country¶
1 to know the avg temperature for different countries by a period of [25years]
2.which country has highest mean temperature till last
3.which country has lowest mean temperature till last
4.In a country to know the mean temperatures by months (mean of months in every year)
Global Average Land Temperature by Cities¶
1 to know the avg temperature for different states by a period of [25years]
2.which city has highest mean temperature
3.which city has lowest mean temperature
4.In a city to know the mean temperatures by months (mean of months in every year)
5.the variation in temperatures by longitudes
import pandas as pd
import numpy as np
import datetime as date
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')
Part 1¶
#Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)
avglantempcon=pd.read_csv(r"C:\Users\Umesh Potha\Desktop\climatedataglobal\GlobalLandTemperaturesByCountry.csv")
avglantempcon.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 577462 entries, 0 to 577461 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 dt 577462 non-null object 1 AverageTemperature 544811 non-null float64 2 AverageTemperatureUncertainty 545550 non-null float64 3 Country 577462 non-null object dtypes: float64(2), object(2) memory usage: 17.6+ MB
avglantempcon.head()
| dt | AverageTemperature | AverageTemperatureUncertainty | Country | |
|---|---|---|---|---|
| 0 | 1743-11-01 | 4.384 | 2.294 | Åland |
| 1 | 1743-12-01 | NaN | NaN | Åland |
| 2 | 1744-01-01 | NaN | NaN | Åland |
| 3 | 1744-02-01 | NaN | NaN | Åland |
| 4 | 1744-03-01 | NaN | NaN | Åland |
for easy in through time series we convert the strings of dt column to datetime time type using the inbuilt to _datetime method()
avglantempcon['dt']=pd.to_datetime(avglantempcon['dt'])
Since all the data in other datases are not from the same date lets drop the all rows upto 1855
avglantempcon=avglantempcon[avglantempcon['dt']>='1850-01-01']
avglantempcon.head()
| dt | AverageTemperature | AverageTemperatureUncertainty | Country | |
|---|---|---|---|---|
| 1274 | 1850-01-01 | -9.083 | 1.834 | Åland |
| 1275 | 1850-02-01 | -2.309 | 1.603 | Åland |
| 1276 | 1850-03-01 | -4.801 | 3.033 | Åland |
| 1277 | 1850-04-01 | 1.242 | 2.008 | Åland |
| 1278 | 1850-05-01 | 7.920 | 0.881 | Åland |
we skip the null values in our dataset as droping them may lead time series breakdown
lets find avg temp of whole country by years
ser1=avglantempcon['Country'].unique()
avglantempcon['year'] = avglantempcon['dt'].dt.year
avglantempcon['month'] = avglantempcon['dt'].dt.month
grouped=avglantempcon.groupby('year')['AverageTemperature']
avgmean=grouped.mean()
plt.ylabel('meantemperatures')
plt.xlabel('years')
plt.title('years - meantemperatures')
plt.plot(avgmean,color='green')
[<matplotlib.lines.Line2D at 0x169da9d8df0>]
grouped1=avglantempcon.groupby('Country')['AverageTemperature']
avgtempcon=grouped1.mean()
lst=list(avgtempcon.index)
plt.figure(figsize=(45,10))
plt.bar(lst[0:100],avgtempcon[0:100],color='purple',width=1)
_=plt.xticks(rotation=90,fontsize=20)
_=plt.yticks(fontsize=20)
_=plt.xlabel('Countries',fontsize=30)
_=plt.ylabel('Mean Temperatures',fontsize=30)
_=plt.title('Countries - Mean Temperature[first 100 countries]',fontsize=30)
plt.figure(figsize=(45,10))
plt.bar(lst[100:200],avgtempcon[100:200],color='purple',width=1)
_=plt.xticks(rotation=90,fontsize=20)
_=plt.yticks(fontsize=20)
_=plt.xlabel('Countries',fontsize=30)
_=plt.ylabel('Mean Temperatures',fontsize=30)
_=plt.title('Countries - Mean Temperature[Next 100 countries]',fontsize=30)
grouped2 = avglantempcon.groupby(['year', 'Country'])['AverageTemperature']
avg_mean_by_year_country =grouped2.mean()
dataset=avg_mean_by_year_country.unstack()
dataset
| Country | Afghanistan | Africa | Albania | Algeria | American Samoa | Andorra | Angola | Anguilla | Antarctica | Antigua And Barbuda | ... | Uruguay | Uzbekistan | Venezuela | Vietnam | Virgin Islands | Western Sahara | Yemen | Zambia | Zimbabwe | Åland |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| year | |||||||||||||||||||||
| 1850 | 13.326083 | 23.672273 | 11.734667 | 22.587333 | NaN | 10.651750 | NaN | 26.106333 | NaN | 25.933250 | ... | NaN | 11.618333 | 24.539167 | 23.198417 | 25.855667 | 22.182250 | NaN | 20.423182 | 20.154364 | 4.648667 |
| 1851 | 13.605667 | NaN | 12.315500 | 22.733333 | NaN | 10.297083 | NaN | 26.261250 | NaN | 26.060917 | ... | 16.840417 | 12.042750 | 24.605250 | 23.352583 | 26.006333 | 22.487000 | NaN | NaN | NaN | 5.306333 |
| 1852 | 13.541167 | NaN | 12.744583 | 22.856583 | NaN | 11.503750 | NaN | 26.026000 | NaN | 25.855167 | ... | 16.806167 | 11.875833 | 24.491583 | 23.219333 | 25.738167 | 22.128000 | NaN | NaN | NaN | 4.995667 |
| 1853 | 13.455833 | NaN | 12.811167 | 22.770583 | NaN | 10.111333 | NaN | 26.241500 | NaN | 26.076583 | ... | 16.780000 | 11.667750 | 24.623083 | 23.504250 | 25.973833 | 21.986333 | NaN | NaN | NaN | 4.907750 |
| 1854 | 13.605750 | NaN | 11.951667 | 22.481167 | NaN | 10.682417 | NaN | 26.165333 | NaN | 25.983583 | ... | 16.951833 | 12.074583 | 24.572583 | 23.497250 | 25.890167 | 22.003667 | NaN | NaN | NaN | 5.626667 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2009 | 15.257750 | 25.026500 | 13.844250 | 24.154333 | 27.034250 | 12.566667 | 22.316500 | 27.468583 | NaN | 27.277333 | ... | 17.871333 | 13.700333 | 26.084917 | 24.465583 | 27.238500 | 23.381083 | 27.342417 | 21.670250 | 21.377250 | 6.489083 |
| 2010 | 15.828667 | 25.472500 | 13.775417 | 25.215667 | 27.453417 | 11.480833 | 22.681500 | 27.856000 | NaN | 27.735417 | ... | 17.920083 | 14.325917 | 26.150250 | 24.833333 | 27.593667 | 24.114250 | 27.302750 | 22.267500 | 21.986250 | 4.861917 |
| 2011 | 15.518000 | 24.786500 | 13.443250 | 24.144167 | 27.009500 | 12.994417 | 22.029667 | 27.528333 | NaN | 27.296167 | ... | 17.824583 | 13.141083 | 25.677333 | 23.692583 | 27.159250 | 23.401250 | 27.288250 | 21.771583 | 21.602417 | 7.170750 |
| 2012 | 14.481583 | 24.725917 | 13.768250 | 23.954833 | 27.201417 | 12.339917 | 22.123333 | 27.639250 | NaN | 27.433500 | ... | 18.509000 | 13.144167 | 25.688583 | 24.704333 | 27.360167 | 23.303417 | 27.445000 | 21.697750 | 21.521333 | 6.063917 |
| 2013 | 16.533625 | 25.208750 | 14.993875 | 25.121500 | 27.517250 | 12.307875 | 22.507875 | 27.363000 | NaN | 27.249625 | ... | 16.754375 | 16.188250 | 25.912875 | 25.232125 | 27.312333 | 23.744250 | 28.129750 | 21.196000 | 20.710750 | 6.229750 |
164 rows × 243 columns
for x in dataset.columns[0:10]:
plt.plot(dataset.index,dataset[x],label=x)
plt.title('MeanTemperatures - Years')
plt.ylabel('MeanTemperatures')
plt.xlabel('Years')
plt.legend(loc='best',title="Countries")
<matplotlib.legend.Legend at 0x169de7a1cd0>
plt.title('MeanTemperatures - Years')
plt.ylabel('MeanTemperatures')
plt.xlabel('Years')
plt.legend(loc='best')
for x in dataset.columns[51:61]:
plt.plot(dataset.index,dataset[x],label=x)
plt.legend(loc='best',title='Countries')
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
<matplotlib.legend.Legend at 0x169dfae5d90>
month_names = {
1: 'January',
2: 'February',
3: 'March',
4: 'April',
5: 'May',
6: 'June',
7: 'July',
8: 'August',
9: 'September',
10: 'October',
11: 'November',
12: 'December'
}
avglantempcon['month'] = avglantempcon['month'].map(month_names)
grouped3=avglantempcon.groupby('month')['AverageTemperature']
avgmeanmonth=grouped3.mean()
avgmeanmonth=avgmeanmonth.reindex(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])
plt.title('Months - Mean Temperatures')
plt.xlabel('Months')
plt.ylabel('Mean Temperature')
_=plt.xticks(rotation=90)
plt.plot(avgmeanmonth,color='red')
plt.show()
grouped4 = avglantempcon.groupby(['month', 'Country'])['AverageTemperature']
avgmeantempmon=grouped4.mean()
dataset2=avgmeantempmon.unstack()
dataset2=dataset2.reindex(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])
plt.xlabel('Months')
plt.ylabel('Temperatures')
plt.title('Months Wise Mean of temperatures of Different Countries')
for x in dataset2.columns[0:5]:
plt.plot(dataset2.index,dataset2[x],label=x)
_=plt.xticks(rotation=45)
plt.legend()
<matplotlib.legend.Legend at 0x169dd77cdc0>
Part 2¶
#to know the avg temperature for different states by a period of [25years]
climate=pd.read_csv(r"C:\Users\Umesh Potha\Desktop\climatedataglobal\GlobalLandTemperaturesByMajorCity.csv")
city=(climate[climate['City'].isin(['Delhi','Bombay','Madras','Bangalore','Calcutta'])])
city.head()
| dt | AverageTemperature | AverageTemperatureUncertainty | City | Country | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|
| 17335 | 1796-01-01 | 22.672 | 2.317 | Bangalore | India | 12.05N | 77.26E |
| 17336 | 1796-02-01 | 24.420 | 1.419 | Bangalore | India | 12.05N | 77.26E |
| 17337 | 1796-03-01 | 26.092 | 2.459 | Bangalore | India | 12.05N | 77.26E |
| 17338 | 1796-04-01 | 27.687 | 1.746 | Bangalore | India | 12.05N | 77.26E |
| 17339 | 1796-05-01 | 27.619 | 1.277 | Bangalore | India | 12.05N | 77.26E |
city=city[city['dt']>'1988-01-01']
city['dt']=pd.to_datetime(city['dt'])
city['year']=city['dt'].dt.year
group1=city.groupby(['year','City'])['AverageTemperature']
data=group1.mean().unstack()
data.head()
| City | Bangalore | Bombay | Calcutta | Delhi | Madras |
|---|---|---|---|---|---|
| year | |||||
| 1988 | 25.717909 | 27.477636 | 27.550364 | 27.020182 | 29.414182 |
| 1989 | 25.198417 | 26.913083 | 26.494583 | 25.432583 | 28.668583 |
| 1990 | 25.254833 | 26.975250 | 26.362417 | 25.538250 | 28.764833 |
| 1991 | 25.487083 | 26.961167 | 26.541833 | 25.738667 | 28.833583 |
| 1992 | 25.159250 | 27.074417 | 26.412667 | 25.521083 | 28.660833 |
plt.xlabel('Years')
plt.ylabel('mean Temperature')
plt.title('Years-temperatures')
for i in ['Delhi','Bombay','Madras','Bangalore','Calcutta']:
plt.plot(data[i],label=i,linestyle='--')
plt.legend(loc='upper left',title='Cities')
<matplotlib.legend.Legend at 0x169de2cdd00>
filtered_df = climate[climate['Country'].isin(['India'])]
city_mean_temperatures = filtered_df.groupby('City')['AverageTemperature'].mean().reset_index()
highest_temp_city = city_mean_temperatures.loc[city_mean_temperatures['AverageTemperature'].idxmax()]
lowest_temp_city = city_mean_temperatures.loc[city_mean_temperatures['AverageTemperature'].idxmin()]
city_mean_temperatures
| City | AverageTemperature | |
|---|---|---|
| 0 | Ahmadabad | 26.529853 |
| 1 | Bangalore | 24.855896 |
| 2 | Bombay | 26.631452 |
| 3 | Calcutta | 26.042152 |
| 4 | Delhi | 25.165861 |
| 5 | Hyderabad | 26.869335 |
| 6 | Jaipur | 25.393058 |
| 7 | Kanpur | 24.760041 |
| 8 | Lakhnau | 24.760041 |
| 9 | Madras | 28.417858 |
| 10 | Nagpur | 25.655016 |
| 11 | New Delhi | 25.165861 |
| 12 | Pune | 24.644615 |
| 13 | Surat | 26.330856 |
plt.figure(figsize=(10, 6))
plt.bar(city_mean_temperatures['City'], city_mean_temperatures['AverageTemperature'])
plt.xlabel('Cities')
plt.ylabel('Mean Temperature (°C)')
plt.title('Mean Temperature in Different Cities')
plt.xticks(rotation=45)
plt.show()
print(f"The city with the lowest mean temperature among the specified cities is {highest_temp_city['City']} with an average temperature of {highest_temp_city['AverageTemperature']:.2f} °C.")
print(f"The city with the lowest mean temperature among the specified cities is {lowest_temp_city['City']} with an average temperature of {lowest_temp_city['AverageTemperature']:.2f} °C.")
The city with the lowest mean temperature among the specified cities is Madras with an average temperature of 28.42 °C. The city with the lowest mean temperature among the specified cities is Pune with an average temperature of 24.64 °C.
climate['dt']=pd.to_datetime(climate['dt'])
climate['month'] = climate['dt'].dt.month
month_names = {
1: 'January',
2: 'February',
3: 'March',
4: 'April',
5: 'May',
6: 'June',
7: 'July',
8: 'August',
9: 'September',
10: 'October',
11: 'November',
12: 'December'
}
climate['month'] = climate['month'].map(month_names)
climate
| dt | AverageTemperature | AverageTemperatureUncertainty | City | Country | Latitude | Longitude | month | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1849-01-01 | 26.704 | 1.435 | Abidjan | Côte D'Ivoire | 5.63N | 3.23W | January |
| 1 | 1849-02-01 | 27.434 | 1.362 | Abidjan | Côte D'Ivoire | 5.63N | 3.23W | February |
| 2 | 1849-03-01 | 28.101 | 1.612 | Abidjan | Côte D'Ivoire | 5.63N | 3.23W | March |
| 3 | 1849-04-01 | 26.140 | 1.387 | Abidjan | Côte D'Ivoire | 5.63N | 3.23W | April |
| 4 | 1849-05-01 | 25.427 | 1.200 | Abidjan | Côte D'Ivoire | 5.63N | 3.23W | May |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 239172 | 2013-05-01 | 18.979 | 0.807 | Xian | China | 34.56N | 108.97E | May |
| 239173 | 2013-06-01 | 23.522 | 0.647 | Xian | China | 34.56N | 108.97E | June |
| 239174 | 2013-07-01 | 25.251 | 1.042 | Xian | China | 34.56N | 108.97E | July |
| 239175 | 2013-08-01 | 24.528 | 0.840 | Xian | China | 34.56N | 108.97E | August |
| 239176 | 2013-09-01 | NaN | NaN | Xian | China | 34.56N | 108.97E | September |
239177 rows × 8 columns
filteredf1 = climate[climate['Country'].isin(['India'])]
grouped1 = filteredf1.groupby(['City','month'])['AverageTemperature']
avgmeantempmon=grouped1.mean()
dataset3=avgmeantempmon.unstack()
dataset3=dataset3.reindex(columns=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.title('Monthly Temperature in all Indian cities')
plt.xlabel('Month')
plt.ylabel('Temperature (°C)')
plt.xticks(rotation=45)
plt.grid(True)
cities = ["Ahmadabad", "Bangalore", "Bombay", "Calcutta", "Delhi", "Hyderabad", "Jaipur"]
colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k']
for city, color in zip(cities, colors):
plt.plot(dataset3.loc[city], marker='o', linestyle='-', label=city, color=color)
plt.legend(loc='lower center',title='Cities')
plt.show()
plt.figure(figsize=(10, 6))
plt.title('Monthly Temperature in all Indian cities')
plt.xlabel('Month')
plt.ylabel('Temperature (°C)')
plt.xticks(rotation=45)
plt.grid(True)
colors = ['red', 'orange', 'purple', 'brown', 'pink', 'gray', 'olive']
cities = ["Kanpur", "Lakhnau", "Madras", "Nagpur", "New Delhi", "Pune", "Surat"]
for city, color in zip(cities, colors):
plt.plot(dataset3.loc[city], marker='o', linestyle='-', label=city, color=color)
plt.legend(loc='lower center',title='Cities')
plt.show()
grouped5 =filteredf1["AverageTemperature"].groupby(filteredf1["Latitude"])
avgmeantemp=grouped5.mean()
plt.ylabel('AverageTempreature')
plt.xlabel('Latitudes')
plt.title('Latitudes - AverageTemperatures')
plt.plot(avgmeantemp)
[<matplotlib.lines.Line2D at 0x169db821100>]
There are many factors for surface temperatures so we cannot define just by latitudes so there is an irregularity in the plot which is observed