Base de données relationnelles, logique SQL.
Le code suivant télécharge les données nécessaires
tp_2a_5_compagnies.zip.
import pyensae.datasource
pyensae.datasource.download_data("tp_2a_5_compagnies.zip")
downloading of http://www.xavierdupre.fr/enseignement/complements/tp_2a_5_compagnies.zip to tp_2a_5_compagnies.zip
unzipped Cities.csv to .Cities.csv
unzipped Compagnies.csv to .Compagnies.csv
unzipped Crews.csv to .Crews.csv
unzipped Crews_planes_habilitation.csv to .Crews_planes_habilitation.csv
unzipped Data_exemple.csv to .Data_exemple.csv
unzipped Data_exemple_2.csv to .Data_exemple_2.csv
unzipped Data_exemple_3.csv to .Data_exemple_3.csv
unzipped Data_exemple_4.csv to .Data_exemple_4.csv
unzipped data_model.txt to .data_model.txt
unzipped data_model_to_csv.py to .data_model_to_csv.py
unzipped enonce.txt to .enonce.txt
unzipped Exemple_arborescent.xml to .Exemple_arborescent.xml
unzipped Flights.csv to .Flights.csv
unzipped Incident.csv to .Incident.csv
unzipped Motors.csv to .Motors.csv
unzipped Motor_models.csv to .Motor_models.csv
unzipped Planes.csv to .Planes.csv
unzipped Plane_models.csv to .Plane_models.csv
['.\Cities.csv',
'.\Compagnies.csv',
'.\Crews.csv',
'.\Crews_planes_habilitation.csv',
'.\Data_exemple.csv',
'.\Data_exemple_2.csv',
'.\Data_exemple_3.csv',
'.\Data_exemple_4.csv',
'.\data_model.txt',
'.\data_model_to_csv.py',
'.\enonce.txt',
'.\Exemple_arborescent.xml',
'.\Flights.csv',
'.\Incident.csv',
'.\Motors.csv',
'.\Motor_models.csv',
'.\Planes.csv',
'.\Plane_models.csv']
import os
import pandas
df_Incident = pandas.read_csv('Incident.csv', sep=';')
df_Flights = pandas.read_csv('Flights.csv', sep=';')
df_Crews = pandas.read_csv('Crews.csv', sep=';')
df_Crews_planes_habilitation = pandas.read_csv('Crews_planes_habilitation.csv', sep=';')
df_Planes = pandas.read_csv('Planes.csv', sep=';')
df_Plane_models = pandas.read_csv('Plane_models.csv', sep=';')
df_Motors = pandas.read_csv('Motors.csv', sep=';')
df_Motor_models = pandas.read_csv('Motor_models.csv', sep=';')
df_Compagnies = pandas.read_csv('Compagnies.csv', sep=';')
df_Cities = pandas.read_csv('Cities.csv', sep=';')
|
Plane_id |
Crew_id |
Departure_id |
Arrival_id |
Date |
Id |
|
|
|
|
|
0 |
40 |
85 |
7 |
0 |
datetime.date(2013, 2, 20) |
1 |
67 |
6 |
16 |
4 |
datetime.date(2013, 1, 9) |
2 |
57 |
67 |
1 |
18 |
datetime.date(2013, 8, 7) |
3 |
41 |
69 |
16 |
14 |
datetime.date(2013, 1, 10) |
4 |
60 |
24 |
4 |
7 |
datetime.date(2013, 9, 26) |
|
Captain_age |
Id |
|
0 |
54 |
1 |
53 |
2 |
56 |
3 |
43 |
4 |
45 |
df_Crews_planes_habilitation.head(5)
|
Plane_model_id |
Crew_id |
|
85 |
40 |
6 |
67 |
67 |
57 |
69 |
41 |
24 |
60 |
|
Immatriculation |
Plane_model_id |
Motor_id |
Compagny_id |
Nb_hours |
Id |
|
|
|
|
|
0 |
'Q-KGNK' |
3 |
0 |
0 |
1213 |
1 |
'Q-HTQG' |
3 |
1 |
0 |
7932 |
2 |
'Q-XRMC' |
1 |
2 |
0 |
2444 |
3 |
'Q-GUOA' |
1 |
3 |
0 |
1595 |
4 |
'Q-WGIW' |
0 |
4 |
0 |
2270 |
|
Constructor |
Model_name |
Id |
|
|
0 |
AIRBUS |
A380 |
1 |
AIRBUS |
A350 |
2 |
BOEING |
787 |
3 |
BOEING |
747 |
|
Motor_model_id |
Nb_hours |
Id |
|
|
0 |
4 |
2334 |
1 |
2 |
3609 |
2 |
2 |
7867 |
3 |
3 |
2673 |
4 |
2 |
3871 |
|
Name |
Manufacturer |
Id |
|
|
0 |
SNECMA-123 |
SNECMA |
1 |
SNECMA-246 |
SNECMA |
2 |
GE-247 |
GE |
3 |
GE-656 |
GE |
4 |
ROLLS ROYCE-345 |
ROLLS ROYCE |
|
Name |
Country |
Id |
|
|
0 |
Qatar Airways |
Qatar |
1 |
Air France |
France |
2 |
Iberia |
Espagne |
3 |
Lufthansa |
Allemagne |
4 |
Ryan Air |
Irlande |
|
Name |
Country |
Id |
|
|
0 |
Bombay |
Inde |
1 |
New York |
USA |
2 |
Tokyo |
Japon |
3 |
Paris |
France |
4 |
Francfort |
Allemagne |
Bien sûr, toutes les informations ne sont pas dans les tables telles
quelles, il faudra faire principalement des jointures et des groupby
pour obtenir les informations que l’on souhaite. Pour obtenir une table
contenant les vols avec une colonne “est_incident”, il faut faire :
import numpy as np
try:
df_Flights.reset_index( inplace = True )
df_Incident.reset_index( inplace = True )
except Exception:
pass
## On suppose que cela vient du fait que les index ont déjà été remis à zéros
df_Flight_Incident = pandas.merge( df_Flights, df_Incident, left_on = "Id", right_on = "Flight_id", how="outer" )
df_Flight_Incident["Is_incident"] = np.isnan( df_Flight_Incident["Flight_id"] ) == False
df_Flight_Incident.head(5)
|
Id |
Plane_id |
Crew_id |
Departure_id |
Arrival_id |
Date |
Flight_id |
Is_incident |
0 |
0 |
40 |
85 |
7 |
0 |
datetime.date(2013, 2, 20) |
NaN |
False |
1 |
1 |
67 |
6 |
16 |
4 |
datetime.date(2013, 1, 9) |
NaN |
False |
2 |
2 |
57 |
67 |
1 |
18 |
datetime.date(2013, 8, 7) |
2 |
True |
3 |
3 |
41 |
69 |
16 |
14 |
datetime.date(2013, 1, 10) |
NaN |
False |
4 |
4 |
60 |
24 |
4 |
7 |
datetime.date(2013, 9, 26) |
NaN |
False |
On peut aussi faire des statistiques par ville de départ …
df_Flight_Incident.groupby( "Departure_id" )["Is_incident"].mean()
Departure_id
0 0.146789
1 0.192308
2 0.220930
3 0.222222
4 0.206107
5 0.218391
6 0.180180
7 0.261682
8 0.120879
9 0.131868
10 0.141176
11 0.184466
12 0.097087
13 0.204082
14 0.174312
15 0.186916
16 0.206897
17 0.272727
18 0.189474
19 0.211538
Name: Is_incident, dtype: float64