::p_load(sf, tidyverse) pacman
Hands-On Exercise 2: Geospatial Data Wrangling
Import Packages
Spatial Data
Importing the Datasets
Master Plan 2014 Subzone Boundaries (Web) from data.gov.sg . Web version is used because it’s optimized for web display, with a smaller file size.
= st_read(dsn = "data/geospatial/master-plan-2014-subzone-boundary-web-shp",
mpsz layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\Jenpoer\IS415-GAA\Hands-On-Exercises\chapter-02\data\geospatial\master-plan-2014-subzone-boundary-web-shp'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
Cycling Path from LTADataMall
= st_read(dsn = "data/geospatial/CyclingPathGazette",
cyclingpath layer = "CyclingPathGazette")
Reading layer `CyclingPathGazette' from data source
`C:\Jenpoer\IS415-GAA\Hands-On-Exercises\chapter-02\data\geospatial\CyclingPathGazette'
using driver `ESRI Shapefile'
Simple feature collection with 2248 features and 2 fields
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: 11854.32 ymin: 28347.98 xmax: 42626.09 ymax: 48948.15
Projected CRS: SVY21
Pre-Schools Location from Dataportal.asia (originally from data.gov.sg)
= st_read("data/geospatial/preschools-location.kml") preschool
Reading layer `PRESCHOOLS_LOCATION' from data source
`C:\Jenpoer\IS415-GAA\Hands-On-Exercises\chapter-02\data\geospatial\preschools-location.kml'
using driver `KML'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6824 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Checking the contents of a simple feature data frame
Checking the information about the geometry, such as type, geographic extent of features, and coordinate system of the data.
st_geometry(mpsz)
Geometry set for 323 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First 5 geometries:
Find out about the attribute information in the data frame.
glimpse(mpsz)
Rows: 323
Columns: 16
$ OBJECTID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
$ SUBZONE_NO <int> 1, 1, 3, 8, 3, 7, 9, 2, 13, 7, 12, 6, 1, 5, 1, 1, 3, 2, 2, …
$ SUBZONE_N <chr> "MARINA SOUTH", "PEARL'S HILL", "BOAT QUAY", "HENDERSON HIL…
$ SUBZONE_C <chr> "MSSZ01", "OTSZ01", "SRSZ03", "BMSZ08", "BMSZ03", "BMSZ07",…
$ CA_IND <chr> "Y", "Y", "Y", "N", "N", "N", "N", "Y", "N", "N", "N", "N",…
$ PLN_AREA_N <chr> "MARINA SOUTH", "OUTRAM", "SINGAPORE RIVER", "BUKIT MERAH",…
$ PLN_AREA_C <chr> "MS", "OT", "SR", "BM", "BM", "BM", "BM", "SR", "QT", "QT",…
$ REGION_N <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGION", "CENT…
$ REGION_C <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR",…
$ INC_CRC <chr> "5ED7EB253F99252E", "8C7149B9EB32EEFC", "C35FEFF02B13E0E5",…
$ FMEL_UPD_D <date> 2014-12-05, 2014-12-05, 2014-12-05, 2014-12-05, 2014-12-05…
$ X_ADDR <dbl> 31595.84, 28679.06, 29654.96, 26782.83, 26201.96, 25358.82,…
$ Y_ADDR <dbl> 29220.19, 29782.05, 29974.66, 29933.77, 30005.70, 29991.38,…
$ SHAPE_Leng <dbl> 5267.381, 3506.107, 1740.926, 3313.625, 2825.594, 4428.913,…
$ SHAPE_Area <dbl> 1630379.27, 559816.25, 160807.50, 595428.89, 387429.44, 103…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((31495.56 30..., MULTIPOLYGON (…
Reveal complete information of a feature object, but retrieve the first few rows only.
head(mpsz, n=5)
Simple feature collection with 5 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 25867.68 ymin: 28369.47 xmax: 32362.39 ymax: 30435.54
Projected CRS: SVY21
OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
1 1 1 MARINA SOUTH MSSZ01 Y MARINA SOUTH
2 2 1 PEARL'S HILL OTSZ01 Y OUTRAM
3 3 3 BOAT QUAY SRSZ03 Y SINGAPORE RIVER
4 4 8 HENDERSON HILL BMSZ08 N BUKIT MERAH
5 5 3 REDHILL BMSZ03 N BUKIT MERAH
PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
1 MS CENTRAL REGION CR 5ED7EB253F99252E 2014-12-05 31595.84
2 OT CENTRAL REGION CR 8C7149B9EB32EEFC 2014-12-05 28679.06
3 SR CENTRAL REGION CR C35FEFF02B13E0E5 2014-12-05 29654.96
4 BM CENTRAL REGION CR 3775D82C5DDBEFBD 2014-12-05 26782.83
5 BM CENTRAL REGION CR 85D9ABEF0A40678F 2014-12-05 26201.96
Y_ADDR SHAPE_Leng SHAPE_Area geometry
1 29220.19 5267.381 1630379.3 MULTIPOLYGON (((31495.56 30...
2 29782.05 3506.107 559816.2 MULTIPOLYGON (((29092.28 30...
3 29974.66 1740.926 160807.5 MULTIPOLYGON (((29932.33 29...
4 29933.77 3313.625 595428.9 MULTIPOLYGON (((27131.28 30...
5 30005.70 2825.594 387429.4 MULTIPOLYGON (((26451.03 30...
Plotting the Geospatial Data
Plot all the features of the data frame in small multiples. max.plot is used to show all the features, since it limits to the first 9 by default.
plot(mpsz, max.plot=15)
Plot only the geometry
plot(st_geometry(mpsz))
Plot the sf object using a specific attribute
plot(mpsz["PLN_AREA_N"])
Projection Transformation
Project a feature to another coordinate system.
st_crs(mpsz)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
Even though the projection system is SVY21, when we read the end of print, the EPSG is 9001. This is wrong because EPSG of SVY21 should be 3414.
SVY21 is coordinate system for Singapore.
<- st_set_crs(mpsz, 3414) mpsz3414
However, different case for Preschool. We cannot just replace the EPSG.
st_geometry(preschool)
Geometry set for 1925 features
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6824 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
First 5 geometries:
It uses WGS84, which we need to actually project mathematically to SVY21.
<- st_transform(preschool,
preschool3414 crs = 3414)
Aspatial Data
Import data
<- read_csv('data/aspatial/listings.csv')
listings list(listings)
[[1]]
# A tibble: 4,161 × 18
id name host_id host_…¹ neigh…² neigh…³ latit…⁴ longi…⁵ room_…⁶ price
<dbl> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
1 50646 Pleasan… 227796 Sujatha Centra… Bukit … 1.33 104. Privat… 80
2 71609 Ensuite… 367042 Belinda East R… Tampin… 1.35 104. Privat… 145
3 71896 B&B Ro… 367042 Belinda East R… Tampin… 1.35 104. Privat… 85
4 71903 Room 2-… 367042 Belinda East R… Tampin… 1.35 104. Privat… 85
5 275344 15 mins… 1439258 Kay Centra… Bukit … 1.29 104. Privat… 49
6 289234 Booking… 367042 Belinda East R… Tampin… 1.34 104. Privat… 184
7 294281 5 mins … 1521514 Elizab… Centra… Newton 1.31 104. Privat… 79
8 324945 Cozy Bl… 1439258 Kay Centra… Bukit … 1.29 104. Privat… 49
9 330089 Cozy Bl… 1439258 Kay Centra… Bukit … 1.29 104. Privat… 55
10 330095 10 mins… 1439258 Kay Centra… Bukit … 1.29 104. Privat… 55
# … with 4,151 more rows, 8 more variables: minimum_nights <dbl>,
# number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
# calculated_host_listings_count <dbl>, availability_365 <dbl>,
# number_of_reviews_ltm <dbl>, license <chr>, and abbreviated variable names
# ¹host_name, ²neighbourhood_group, ³neighbourhood, ⁴latitude, ⁵longitude,
# ⁶room_type
Listings is a tibble
is_tibble(listings)
[1] TRUE
Create a simple feature data frame from aspatial data
<- st_as_sf(listings,
listings_sf coords = c("longitude", "latitude"),
crs=4326) %>%
st_transform(crs = 3414)
glimpse(listings_sf)
Rows: 4,161
Columns: 17
$ id <dbl> 50646, 71609, 71896, 71903, 275344, 289…
$ name <chr> "Pleasant Room along Bukit Timah", "Ens…
$ host_id <dbl> 227796, 367042, 367042, 367042, 1439258…
$ host_name <chr> "Sujatha", "Belinda", "Belinda", "Belin…
$ neighbourhood_group <chr> "Central Region", "East Region", "East …
$ neighbourhood <chr> "Bukit Timah", "Tampines", "Tampines", …
$ room_type <chr> "Private room", "Private room", "Privat…
$ price <dbl> 80, 145, 85, 85, 49, 184, 79, 49, 55, 5…
$ minimum_nights <dbl> 92, 92, 92, 92, 60, 92, 92, 60, 60, 60,…
$ number_of_reviews <dbl> 18, 20, 24, 47, 14, 12, 133, 17, 12, 3,…
$ last_review <date> 2014-12-26, 2020-01-17, 2019-10-13, 20…
$ reviews_per_month <dbl> 0.18, 0.15, 0.18, 0.34, 0.11, 0.10, 1.0…
$ calculated_host_listings_count <dbl> 1, 6, 6, 6, 44, 6, 7, 44, 44, 44, 6, 7,…
$ availability_365 <dbl> 365, 340, 265, 365, 296, 285, 365, 181,…
$ number_of_reviews_ltm <dbl> 0, 0, 0, 0, 1, 0, 0, 3, 2, 0, 1, 0, 0, …
$ license <chr> NA, NA, NA, NA, "S0399", NA, NA, "S0399…
$ geometry <POINT [m]> POINT (22646.02 35167.9), POINT (…
Geoprocessing
Buffering
Buffering is reclassification based on distance: classification of within/without a given proximity.
<- st_buffer(cyclingpath,
buffer_cycling dist=5, nQuadSegs = 30) # 5 meter buffers along cycling paths
Calculate area of buffers
$AREA <- st_area(buffer_cycling) buffer_cycling
Show sum of total land involved
sum(buffer_cycling$AREA)
1556978 [m^2]
Point-in-polygon Count
Goal: Identify pre-schools located in each Planning Subzone using st_intersects.
length is used to calculate number of pre-schools that fall inside each zone.
$`PreSch Count`<- lengths(st_intersects(mpsz3414, listings_sf)) mpsz3414
summary(mpsz3414$`PreSch Count`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 2.000 9.709 8.000 253.000
List the subzone with the most number of preschools
top_n(mpsz3414, 1, `PreSch Count`)
Simple feature collection with 1 feature and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 30342.65 ymin: 32035.6 xmax: 31453.96 ymax: 33233.11
Projected CRS: SVY21 / Singapore TM
OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N PLN_AREA_C
1 126 6 LAVENDER KLSZ06 N KALLANG KL
REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR Y_ADDR
1 CENTRAL REGION CR A7A07F328A38B6EF 2014-12-05 30874.41 32569.53
SHAPE_Leng SHAPE_Area geometry PreSch Count
1 3609.15 757907.6 MULTIPOLYGON (((31389.56 32... 253
Display count of preschools in subzone number
aggregate(`PreSch Count` ~ SUBZONE_NO, mpsz3414, FUN=sum)
SUBZONE_NO PreSch Count
1 1 426
2 2 372
3 3 522
4 4 340
5 5 314
6 6 337
7 7 124
8 8 101
9 9 146
10 10 154
11 11 91
12 12 55
13 13 57
14 14 35
15 15 16
16 16 39
17 17 7
Count density by planning subzone
$Area <- mpsz3414 %>%
mpsz3414st_area()
<- mpsz3414 %>%
mpsz3414 mutate(`PreSch Density` = `PreSch Count`/Area * 1000000)
head(mpsz3414)
Simple feature collection with 6 features and 18 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 24468.89 ymin: 28369.47 xmax: 32362.39 ymax: 30542.74
Projected CRS: SVY21 / Singapore TM
OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
1 1 1 MARINA SOUTH MSSZ01 Y MARINA SOUTH
2 2 1 PEARL'S HILL OTSZ01 Y OUTRAM
3 3 3 BOAT QUAY SRSZ03 Y SINGAPORE RIVER
4 4 8 HENDERSON HILL BMSZ08 N BUKIT MERAH
5 5 3 REDHILL BMSZ03 N BUKIT MERAH
6 6 7 ALEXANDRA HILL BMSZ07 N BUKIT MERAH
PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
1 MS CENTRAL REGION CR 5ED7EB253F99252E 2014-12-05 31595.84
2 OT CENTRAL REGION CR 8C7149B9EB32EEFC 2014-12-05 28679.06
3 SR CENTRAL REGION CR C35FEFF02B13E0E5 2014-12-05 29654.96
4 BM CENTRAL REGION CR 3775D82C5DDBEFBD 2014-12-05 26782.83
5 BM CENTRAL REGION CR 85D9ABEF0A40678F 2014-12-05 26201.96
6 BM CENTRAL REGION CR 9D286521EF5E3B59 2014-12-05 25358.82
Y_ADDR SHAPE_Leng SHAPE_Area geometry PreSch Count
1 29220.19 5267.381 1630379.3 MULTIPOLYGON (((31495.56 30... 8
2 29782.05 3506.107 559816.2 MULTIPOLYGON (((29092.28 30... 26
3 29974.66 1740.926 160807.5 MULTIPOLYGON (((29932.33 29... 48
4 29933.77 3313.625 595428.9 MULTIPOLYGON (((27131.28 30... 2
5 30005.70 2825.594 387429.4 MULTIPOLYGON (((26451.03 30... 1
6 29991.38 4428.913 1030378.8 MULTIPOLYGON (((25899.7 297... 7
Area PreSch Density
1 1630379.3 [m^2] 4.906834 [1/m^2]
2 559816.2 [m^2] 46.443811 [1/m^2]
3 160807.5 [m^2] 298.493545 [1/m^2]
4 595428.9 [m^2] 3.358923 [1/m^2]
5 387429.4 [m^2] 2.581115 [1/m^2]
6 1030378.8 [m^2] 6.793618 [1/m^2]
Exploratory Data Analysis (EDA)
Basic plot
hist(mpsz3414$`PreSch Density`)
Fancier plot using ggplot2
ggplot(data=mpsz3414,
aes(x= as.numeric(`PreSch Density`)))+
geom_histogram(bins=20,
color="black",
fill="light blue") +
labs(title = "Are pre-school even distributed in Singapore?",
subtitle= "There are many planning sub-zones with a single pre-school, on the other hand, \nthere are two planning sub-zones with at least 20 pre-schools",
x = "Pre-school density (per km sq)",
y = "Frequency")
Scatter plot between preschool density and preschool count
ggplot(data=mpsz3414,
aes(y = `PreSch Count`,
x= as.numeric(`PreSch Density`)))+
geom_point(color="black",
fill="light blue") +
xlim(0, 40) +
ylim(0, 40) +
labs(title = "",
x = "Pre-school density (per km sq)",
y = "Pre-school count")