Take-Home Exercise 02

Author

Ong Chae Hui

Published

May 17, 2023

Modified

June 3, 2023

1. Overview

With reference to the Mini-Challenge 2 of VAST Challenge 2023 and by using appropriate static and interactive statistical graphics methods, we will help FishEye identify companies that may be engaged in Illegal, unreported, and unregulated (IUU) fishing.

1.1. The Task

Use visual analytics to identify temporal patterns for individual entities and between entities in the knowledge graph FishEye created from trade records. Categorize the types of business relationship patterns you find.

1.2. Data Source

For this task, we will make use of the mc2_challenge_graph.json provided for the data analysis and visualisation.

2. Loading and Launching of Required R Packages

The required R library packages are being loaded. For this exercise, we will make use of the following R library packages.

jsonlite, JSON parser and generator optimized for statistical data and the web.
tidygraph provides a tidy framework for all things relational (networks/graphs, trees, etc.)
ggraph, an extension of the ggplot2 API tailored to graph visualizations and provides the same flexible approach to building up plots layer by layer.
visNetwork for network visualization.
lubridate is an R package that makes it easier to work with dates and times.
igraph a library collection for creating and manipulating graphs and analyzing networks.
tidyverse, a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.
DT provides an R interface to the JavaScript library DataTables that create interactive table on html page.

The code chunk below uses pacman::p_load() to check if the above packages are installed. If they are, they will be loaded into the R environment.

Show the codes

pacman::p_load(jsonlite, tidygraph, ggraph, visNetwork, lubridate,
               igraph, tidyverse, DT)

3. Data Preparation

We will first load each of the data files into the environment and perform data wrangling.

3.1. Loading and Extracting the Data

Based on the VAST 2023 data notes, column dataset will always be ‘mc2’, to represent this set of data belongs to mini challenge 2. As such, we will not import this column into the R environment.

3.1.1. Load main file mc2_challenge_graph.json

We will first load in the main file, mc2_challenge_graph.json, then extract the nodes and edges (links) information out.

Show the codes

mc2 <- fromJSON("data/mc2_challenge_graph.json")

3.1.1.1. Extracting the nodes data.frame from mc2

Show the codes

mc2_nodes <- as_tibble(mc2$nodes) %>%
  select(id, shpcountry, rcvcountry)

3.1.1.2. Extracting the edges (links) data.frame from mc2

Show the codes

mc2_edges <- as_tibble(mc2$links) %>%
  select(source, target, arrivaldate, hscode,
         weightkg, valueofgoods_omu, volumeteu,
         valueofgoodsusd)

3.1.1.3. Examining the structure of mc2_nodes and mc2_edges data.frames using `glimpse()` of dyplr.

Show the codes

glimpse(mc2_nodes)

Rows: 34,576
Columns: 3
$ id         <chr> "AquaDelight Inc and Son's", "BaringoAmerica Marine Ges.m.b…
$ shpcountry <chr> "Polarinda", NA, "Oceanus", NA, "Oceanus", "Kondanovia", NA…
$ rcvcountry <chr> "Oceanus", NA, "Oceanus", NA, "Oceanus", "Utoporiana", NA, …

Show the codes

glimpse(mc2_edges)

Rows: 5,464,378
Columns: 8
$ source           <chr> "AquaDelight Inc and Son's", "AquaDelight Inc and Son…
$ target           <chr> "BaringoAmerica Marine Ges.m.b.H.", "BaringoAmerica M…
$ arrivaldate      <chr> "2034-02-12", "2034-03-13", "2028-02-07", "2028-02-23…
$ hscode           <chr> "630630", "630630", "470710", "470710", "470710", "47…
$ weightkg         <int> 4780, 6125, 10855, 11250, 11165, 11290, 9000, 19490, …
$ valueofgoods_omu <dbl> 141015, 141015, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ volumeteu        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ valueofgoodsusd  <dbl> NA, NA, NA, NA, NA, NA, 87110, 188140, NA, 221110, 58…

Examination of the data structure

There are a number of chr data type columns in both mc2_nodes and mc2_edges, as a good practice, we will trim away any possible leading and trailing white spaces of the data, before we perform any analysis.
The arrivaldate has the format of ‘YYYY-MM-DD’ and is treated as chr data type instead of date data type.
A closer examination on the columns valueofgoods_omu, volumeteu, valueofgoodsusd columns revealed that there are also a large number of NA (missing values) and are deemed as as incomplete. We will drop these columns from analysis.
We will also need to filter out any possible duplicate records by using the distinct() function.

3.1.1.4. Perform Data Wrangling for mc2_edges

Data Preparation for mc2_edges

Drop columns valueofgoods_omu, volumeteu and valueofgoodsusd since they are deemed as as incomplete for analysis.
Convert arrivaldate from chr data type to date data type by using ymd().
Extract the year component out from arrivaldate field with year().
Perform white space trimming for the chr columns,to remove any the leading and trailing white spaces with trimws().
Remove any possible duplicate records by using distinct()
Based on the data notes provided in the VAST Challenge, hscode refers to the Harmonized Commodity Description and Coding System Nomenclature (HS) developed by World Customs Organization (WCO). We will filter all the records based on the hscodes related to the context of the challenge. The relevant hscodes are prefixed with: 301, 302, 303, 304, 305, 306, 307, 308, 309, 1504, 1603, 1604, 1605 and 2301).
Aggregate the records, by deriving a weight column based on the number of records by grouping the source, target, hscode and year.

Show the codes

# For the edges data frame, we will make use of select() to extract 
# and rearrange the columns.
mc2_edges <- mc2_edges %>%
  select(arrivaldate, source, target, hscode, weightkg) %>%
  mutate(arrivaldate = ymd(arrivaldate)) %>%
  mutate(year = year(arrivaldate)) %>%
  mutate(source = trimws(source)) %>%
  mutate(target = trimws(target)) %>%
  mutate(hscode = trimws(hscode)) %>%
  distinct()

Show the codes

# List of relevant hscodes related to fishing
rel_hscodes_3 <- c("301", "302", "303", "304", "305",
                   "306", "307", "308", "309")

rel_hscodes_4 <- c("1504", "1603", "1604", "1605", "2301")

# Extract the records with the above 1st 3 or 4 characters of hscodes
# using substr().
mc2_edges_aggregated <- mc2_edges %>%
  filter((substr(hscode, 1, 3) %in% rel_hscodes_3) |
         (substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(source, target, hscode, year) %>%
  summarise(weight = n()) %>%
  filter(source!=target) %>%
  arrange(weight) %>%
  ungroup()

3.1.1.5. Perform Data Wrangling for mc2_nodes

Data Preparation for mc2_nodes

The nodes within the node file must be unique and since we have filtered and cleaned the data in mc2_edges_aggregated, we will make use of this to extract the nodes that are used here.

rbind() is used to combine the data in both source and target columns of mc2_edges_aggregated. We will then extract the unique records to form the nodes data frame.

Show the codes

id1 <- mc2_edges_aggregated %>%
  select(source) %>%
  rename(id = source)
id2 <- mc2_edges_aggregated %>%
  select(target) %>%
  rename(id = target)

# create a new nodes data table derived from the source and target of edge data. This would ensure that only nodes with connections will be included.

mc2_nodes_extracted <- rbind(id1, id2) %>%
  distinct()

Show the codes

# Build tbl_graph using the valid nodes and edges 
mc2_graph <- tbl_graph(nodes = mc2_nodes_extracted,
                       edges = mc2_edges_aggregated,
                       directed = TRUE)

4. Data Exploration

As part of the data exploration and analysis, we will compute the centrality indices for the nodes and include them as attribute information to the nodes, before plotting them out for visualisation and analysis. We will also rename the id column to label and create a new column id by using the row_number() to ease the build of the visualisation graphs using visNetwork package.

Standard Binning were being performed on the computed centrality indices for the ease of analysis. The respective indices will be binned into 3 groups, Low, Medium and High.

We will also round up the centrality indices to 3 decimal points for the purpose of analysis.

Show the codes

# Renaming the id column to label and
# create a new column, id with the row_number()
mc2_graph <- mc2_graph %>%
  activate(nodes) %>%
  rename(label = id) %>%
  mutate(id=row_number())

Show the codes

mc2_graph <- mc2_graph %>%
  # Compute the degree centrality of each node and bin the scores into logical groups
  mutate(degree_score = centrality_degree()) %>%
  mutate(deg_bin = cut(degree_score, breaks = c(0, 200, 300, Inf),
                       labels = c("Low\n(0-199)", 
                                  "Medium\n(200-299)", 
                                  "High\n(>=300)\n"),  
                       include.lowest = TRUE))  %>%

  # Compute the closeness centrality and bin the scores into logical groups
  mutate(closeness_score = round(centrality_closeness(), digits=3)) %>%
  mutate(close_bin = cut(closeness_score,
                         breaks = c(0, 0.5, 0.8, Inf),
                         labels = c("Low\n(<=0.49)", 
                                    "Medium\n(0.5-0.79)", 
                                    "High\n(>=0.8)\n"), 
                         include.lowest = TRUE)) %>%

  # Compute the betweenness centrality and bin the scores into logical groups
  mutate(betweenness_score = round(centrality_betweenness(), digits = 3)) %>%
  mutate(bet_bin = cut(betweenness_score, 
                       breaks = c(0, 50000, 100000, 500000, Inf),
                       labels = c("Low\n(0-49K)", 
                                  "Medium\n(50K-99K)", 
                                  "High\n(100K-499K)\n",
                                  "Very High\n(>=500K)"),
                       include.lowest = TRUE)) 


# Computing the in-degree and out-degree centrality indices
# Convert to igraph for computing the in-degree and out-degree centrality
igraph_obj <- as.igraph(mc2_graph)

# In-degree centrality
in_degree_centrality <- degree(igraph_obj, mode = "in")

# Out-degree centrality
out_degree_centrality <- degree(igraph_obj, mode = "out")

# Eigenvector centrality
eigen_centrality <- eigen_centrality(igraph_obj, directed = TRUE)$vector


# Assigning in-degree and out-degree centrality as vertex attributes
V(mc2_graph)$in_degree <- in_degree_centrality
V(mc2_graph)$out_degree <- out_degree_centrality
V(mc2_graph)$eigen_score <- round(eigen_centrality, digits=3) 


# Binning the in_degree scores and put as a variable inside mc2_graph
mc2_graph <- mc2_graph %>% 
  mutate(in_bin = cut(in_degree, 
                      breaks = c(0, 300, 600, Inf),
                      labels = c("Low\n(<300)", 
                                 "Medium\n(300-599)", 
                                 "High\n(>=600)"),
                      include.lowest = TRUE)) %>%
  
  # Binning the out_degree scores and put as a variable inside mc2_graph
  mutate(out_bin = cut(out_degree, 
                       breaks = c(0, 2, 300, 500, Inf),
                       labels = c("0-1", "Low\n(2-300)", 
                                  "Medium\n(301-500)", 
                                  "High\n(>500)"),
                       include.lowest = TRUE)) %>%
  
  # Binning the eigenvector scores and put as a variable inside mc2_graph
  mutate(eigen_bin = cut(eigen_score,
                         breaks = c(0, 0.5, 0.8, Inf),
                         labels = c("Low\n(<=0.49)", 
                                    "Medium\n(0.5-0.79)", 
                                    "High\n(>=0.8)\n"), 
                         include.lowest = TRUE))

Once we have all the centrality indices included into the graph, we will convert the nodes and edges into tibble data frames for easier data manipulation for visualisation purposes.

Show the codes

# Converting the nodes with the centrality measures into a tibble dataframe
nodes_df <- mc2_graph %>%
  activate(nodes) %>%
  as_tibble()

Show the codes

# Converting the edges into a tibble dataframe
edges_df <- mc2_graph %>%
  activate(edges) %>%
  as_tibble()

4.1. Overview of the Network

From the network graph below, each node represents an individual business entity (label), which you could select from the dropdown list, to examine the connectivity of this entity with other entities.

Design Considerations

In order not to over clutter the network graph, all edges with weight <= 50 will be filtered out and not included.
For the ease of selecting a particular entity, a dropdown list with all the entities (nodes) present in the graph is provided. The list is also sorted in ascending order by using arrange().
In addition, directional arrows have also been included on the edges, to identify the in-flow and out-flow edges.
Mouse pointer hover action is also included on the graph so that the user can hover the mouse pointer over the graph to look at the possible different ‘groups’ of connectivity.

Show the codes

# Preparing the data for visualisation
edges <- edges_df %>%
  filter(weight > 50)

nodes <- nodes_df %>%
  filter(id %in% c(edges$from, edges$to)) %>%
  arrange(label)

# Building the network graph using visNetwork package
vis_nw <- visNetwork(nodes, edges,
                     main = "An Overview of the Network Graph") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visLayout(randomSeed = 1234) %>%
  visNodes() %>%
  visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T),
             nodesIdSelection = TRUE) %>%
  visEdges(arrows = "to", smooth = list(enabled = TRUE, type = "curvedCW"))

vis_nw

Observations

From the overview network graph, we noticed that some of the nodes in the middle seemed to be denser compared to the rest, indicating that these nodes probably have more connections to the other nodes. We were able to identify the ids of these nodes when we zoom into the graph. We can also look at individual node’s connectivity to other nodes by clicking on it or hover it with the mouse pointer. However, this requires the user to maneuver around the graph with great efforts.

We will make use of the different centrality indices computed earlier to uncover more insights in the subsequent sections.

4.2. The Important Nodes of the Network - Eigenvector Centrality

Eigenvector centrality index captures the overall importance of a node based on its connections and the importance of those connections in the network. Nodes with high eigenvector centrality are connected to other nodes that are also highly central or influential.

We will attempt to identify the main hub nodes within the network by looking at the nodes with high eigenvector centrality index.

Firstly, we will take a look at the distribution of this centrality index across all the nodes to gain an overview picture of the network.

Design Considerations

In view of the big differentiation in the count for each bin, the count for each bin is displayed at the top of each bar for easy reference.

Distribution of Eigenvector Centrality Index
Code Chunk

# Plotting the Distribution for Eigenvector Centrality Index

# Get the count of the records in each bin
nodes_df_eigen_gp <- nodes_df %>%
  group_by(eigen_bin) %>%
  summarise(cnt = n())

# Create a distribution with bar chart using ggplot2
plot_eigen <- ggplot(nodes_df_eigen_gp, aes(x = eigen_bin, y = cnt)) +
  geom_bar(stat = "identity", fill = "deepskyblue3") +
  geom_text(aes(label = cnt), vjust = -0.5, color = "black") +
  labs(x = "\nEigenvector Centrality Index", y = "Count\n", 
       title = "Distribution of Eigenvector Centrality Index") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) 

plot_eigen

Observations from the Distribution of Eigenvector Centrality Index

The graph above revealed that there are 4 nodes with eigenvector centrality index of >0.5. This means that these nodes are likely to be connected to other important nodes in the network.

A quick examination of the eigenvector centrality index of the nodes revealed that 11,657 out of 11,688 nodes (>99%) have an eigenvector centrality index of <0.1. Hence, for the purpose of visual analysis, we will just focus on these 4 nodes and the connectivity between its immediate neighbouring nodes.

4.2.1. Comparing the Different Centrality Indices

We will take a closer look at the in-degree centrality index and out-degree centrality index for these 4 nodes which have high eigenvector centrality indices.

Show the codes

# Extracting the top 4 entities with highest centrality indices
nodes_df_eigen_top4 <- nodes_df %>%
  filter(eigen_score >= 0.5) %>%
  select(id, label, eigen_score, eigen_bin, in_degree, in_bin, 
         out_degree, out_bin) %>%
  arrange(desc(eigen_score))

datatable(nodes_df_eigen_top4, 
          class="stripe",
          caption = "\nTable 1: Business Entities with High Eigenvector Centrality Index\n",
          colnames = c("ID", "Entity", "Eigenvector Centrality Index", 
                       "Eigenvector Centrality Category", 
                       "In-Degree Centrality Index",
                       "In-Degree Centrality Category",
                       "Out-Degree Centrality Index",
                       "Out-Degree Centrality Category"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Observations

Based on the information presented in Table 1 above, we observed that these 4 nodes also have very high number of in-degree centrality index and very low number of out-degree centrality index, suggesting that they are likely to be the transhipments/fishery wharves/central collection centers within the network.

4.2.2. Eigenvector Centrality Network Graph

Show the codes

# get the company names for the edges
edges_df_eigen <- edges_df %>%
  filter(from %in% nodes_df_eigen_top4$id |
         to %in% nodes_df_eigen_top4$id) %>%
  filter(weight > 50)

# build the nodes table
nodes_eigen <- nodes_df %>%
  filter(id %in% c(edges_df_eigen$from, edges_df_eigen$to)) %>%
  rename(group = eigen_bin) %>%
  arrange(label)


visNetwork(nodes_eigen,
           edges_df_eigen,
           main = "Network Graph of the Top 4 Entities with High Eigenvector Centrality") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visLayout(randomSeed = 1234) %>%
  visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T),
             nodesIdSelection = TRUE) %>%
  visNodes(id = nodes_eigen$id, size=50) %>%
  visLegend(width = 0.1, position = "right", main = "Group") %>%
  visEdges(arrows = "to", 
           smooth = list(enabled = TRUE, 
                         type = "curvedCW"))

Observations

The Network Graph revealed that all the links to the 4 high eigenvector centrality nodes were in-flow from external nodes. This reaffirms the suggestion that these nodes are likely to be the transhipments/fishery wharves/central collection centers within the network.

4.3. Looking at the Degree of Centrality

Degree centrality is a measure of the importance or popularity of a node in a network based on the number of direct connections it has with other nodes in the network.

Distribution of Degree Centrality Index
Code Chunk

# Plotting the Distribution for Degree Centrality Index
nodes_df_deg_gp <- nodes_df %>%
  group_by(deg_bin) %>%
  summarise(cnt = n())

# Create a bar chart with hover text using ggplot2
plot <- ggplot(nodes_df_deg_gp, aes(x = deg_bin, y = cnt)) +
  geom_bar(stat = "identity", fill = "deepskyblue3") +
  geom_text(aes(label = cnt), vjust = -0.5, color = "black") +
  labs(x = "\nDegree Centrality", y = "Count\n", 
       title = "Distribution of Degree Centrality of the Nodes") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) 

plot

Observations from the Distribution of Degree Centrality Index

The distribution chart above revealed that more than 99% of the nodes have low degree centrality index. There are 18 nodes with high degree centrality index of >= 300. This means that these nodes have significantly high number of direct connections to other nodes within the network.

4.3.1. Comparing the Different Centrality Indices

We will take a look at the different centrality indices for these 18 nodes.

Show the codes

# Extract the nodes with high degree centrality index

nodes_df_deg <- nodes_df %>%
  filter(degree_score >= 300) %>%
  select(id, label, degree_score, deg_bin, 
         in_degree, in_bin, out_degree, out_bin) %>%  
  arrange(desc(degree_score))

datatable(nodes_df_deg, 
          class="stripe",
          caption = "\nTable 2: Business Entities with High Degree Centrality Index\n",
          colnames = c("ID", "Entity", 
                       "Degree Centrality Index",
                       "Degree Centrality Category", 
                       "In-Degree Centrality Index",
                       "In-Degree Centrality Category",
                       "Out-Degree Centrality Index",
                       "Out-Degree Centrality Category"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Observations

Based on the information presented in Table 2 above, we observed that these nodes with high degree centrality indicdes also have very high number of out-degree centrality index. In addition, majority of these nodes also have some incoming links from the other external nodes, suggesting that these nodes would likely to be transhipments within the network.

4.3.2. Degree Centrality Network Graph

Show the codes

# get the entity names for the edges
edges_df_deg <- edges_df %>%
  filter(from %in% nodes_df_deg$id |
         to %in% nodes_df_deg$id) %>%
  filter(weight > 30)

# build the nodes table
nodes_deg <- nodes_df %>%
  filter(id %in% c(edges_df_deg$from, edges_df_deg$to)) %>%
  rename(group = deg_bin) %>%
  arrange(label)

visNetwork(nodes_deg,
           edges_df_deg,
           main = "Entities with High Degree Centrality Index") %>%
  visIgraphLayout(layout = "layout_with_kk") %>%
  visLayout(randomSeed = 1234) %>%
  visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T),
             nodesIdSelection = TRUE) %>%
  visNodes(id = nodes_deg$id, size=30) %>%
  visLegend(width = 0.1, position = "right", main = "Group") %>%  
  visEdges(arrows = "to", 
           smooth = list(enabled = TRUE))

Observations

We observed that the nodes with high degree centrality index (in yellow) generally have outgoing connections to other nodes, which suggested that these nodes are likely to be the bigger fishing vessels or transhipments. Particularly, 2 of these nodes (The Salty Dog Limited Liability Company and xiǎo xiā Oyj Marine ecology) also have incoming links to them.

Given that illegal fishing at international waters would typically involve large vessels acting as transhipments to ‘launder’ the illegally caught fish by exchanging for other resources such as fuel, we will examine the hscodes that these 2 entities have, as part of investigation.

4.3.3. Close Examination of “The Salty Dog Limited Liability Company”

We will extract out all the non-fishing related hscodes that involve this company as the source and target to further understand what are the resources that were being traded with the other entities. Once we extracted the relevant records, we will do a count of the number of edges with n() and analyse the top 10 records with the highest number of count.

Show the codes

## Salty Dog as the Source
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
salty_dog_source_details <- mc2_edges %>%
  filter(source == "The Salty Dog Limited Liability Company") %>%
  filter(source != target) %>%
  group_by(target, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  distinct() %>%
  ungroup()

# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
salty_dog_source_hscodes <- salty_dog_source_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


## Salty Dog as the Target
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
salty_dog_target_details <- mc2_edges %>%
  filter(target == "The Salty Dog Limited Liability Company") %>%
  filter(source != target) %>%
  group_by(source, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  arrange(desc(weight)) %>%
  distinct() %>%
  ungroup()


# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
salty_dog_target_hscodes <- salty_dog_target_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


# Presenting the top 10 records in a data table (Source) for analysis
datatable(head(salty_dog_source_hscodes,10), 
          class="stripe",
          caption = "\nTable 3: List of Business Entities with 'Salty Dog' as the Source\n",
          colnames = c("Target Entity", "Weightkg", 
                       "HS Code", "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Show the codes

# Presenting the top 10 records in a data table (Target) for analysis
datatable(head(salty_dog_target_hscodes,10), 
          class="stripe",
          caption = "\nTable 4:  List of Business Entities with 'Salty Dog' as the Target\n",
          colnames = c("Source Entity", "Weightkg", "HS Code",
                       "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Observations

Salty Dog as the Source Provider

Table 3 shows that list of Business Entities with ‘Salty Dog’ as the Source Provider. From the list of HS Codes, we noted that ‘The Salty Dog Limited Liability Company’ also provides other forms of consumables, particularly vegetables (for hscodes starting with 710 to 713) to the respective parties. Based on the value of the Count column for these products, we conclude that the trades were unlikely to be related to IUU fishing.

Salty Dog as the Target Recipient

Table 4 shows the list of Business Entities with ‘Salty Dog’ as the Target Recipient. From the list of HS Codes, we noticed that ‘The Salty Dog Limited Liability Company’ does not only accept fish and seafood, but also other products such as bags, container cases/boxes with outer surface of plastics or of textile materials (for hscode == 420212). However, given that the values of the Count column were not extremely high, we conclude that the trades were unlikely to be related to IUU fishing.

4.3.4. Close Examination of “xiǎo xiā Oyj Marine ecology”

Similarly for the close examination of “xiǎo xiā Oyj Marine ecology”, we will extract out all the non-fishing related hscodes that involve this company as the source and target to further understand what are the resources that were being traded with the other entities. Once we extracted the relevant records, we will do a count of the number of edges with n() and analyse the top 10 records with the highest number of count.

Show the codes

## xiǎo xiā as the Source
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
xiao_xia_source_details <- mc2_edges %>%
  filter(source == "xiǎo xiā Oyj Marine ecology") %>%
  filter(source != target) %>%
  group_by(target, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  distinct() %>%
  ungroup()

# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
xiao_xia_source_hscodes <- xiao_xia_source_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


## xiǎo xiā as the Target
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
xiao_xia_target_details <- mc2_edges %>%
  filter(target == "xiǎo xiā Oyj Marine ecology") %>%
  filter(source != target) %>%
  group_by(source, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  arrange(desc(weight)) %>%
  distinct() %>%
  ungroup()


# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
xiao_xia_target_hscodes <- xiao_xia_target_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


# Presenting the top 10 records in a data table (Source) for analysis
datatable(head(xiao_xia_source_hscodes, 10), 
          class="stripe",
          caption = "\nTable 5: List of Business Entities with 'Xiao Xia' as the Source\n",
          colnames = c("Target Entity", "Weightkg", "HS Code",
                       "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Show the codes

# Presenting the top 10 records in a data table (Target) for analysis
datatable(head(xiao_xia_target_hscodes, 10),
          class="stripe",
          caption = "\nTable 6:  List of Business Entities with 'Xiao Xia' as the Target\n",
          colnames = c("Source Entity", "Weightkg", "HS Code",
                       "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Observations

xiǎo xiā as the Source Provider

Table 5 shows that list of Business Entities with ‘xiǎo xiā’ as the Source Provider. From the list of HS Codes, we noted that ‘xiǎo xiā Oyj Marine ecology’ also provides other products such as:

lighting or visual signalling equipment (for hscode == 851220)
plastics e.g. polyacetals (for hscode == 390720)
granite (for hscode == 680293). Based on the value of the Count column for these products, we conclude that the trades were unlikely to be related to IUU fishing.

xiǎo xiā as the Target Recipient

Table 6 shows the list of Business Entities with ‘xiǎo xiā’ as the Target Recipient. From the list of HS Codes, we noticed that ‘xiǎo xiā Oyj Marine ecology’ does not only accept fish and seafood, but printed paper or paperboard labels (for hscode = 482110) among the others. However, given that the values of the Count column were not extremely high, we conclude that the trades were unlikely to be related to IUU fishing.

4.4. Looking at the In-Degree Centrality

In-degree centrality is another measure of centrality in a network that focuses on incoming connections to a node. Nodes with high in-degree centrality have a larger number of connections pointing towards them, indicating that they receive a lot of resources from other nodes.

Distribution for In-Degree Centrality Index
Code Chunk

# Plotting the Distribution for In-Degree centrality Index
nodes_df_in <- nodes_df %>%
  group_by(in_bin) %>%
  summarise(cnt = n())

# Create a bar chart with hover text using ggplot2
plot_in <- ggplot(nodes_df_in, aes(x = in_bin, y = cnt)) +
  geom_bar(stat = "identity", fill = "deepskyblue3") +
  geom_text(aes(label = cnt), vjust = -0.5, color = "black") +
  labs(x = "\nIn-Degree Centrality", y = "Count\n", 
       title = "Distribution of In-Degree Centrality of the Nodes") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) 

plot_in

Observations from the Distribution of In-Degree Centrality Index

It is observed that there are 27 nodes with significantly high in-degree centrality index (>=600), which are likely to be transhipments/collection hubs for the fishing vessels.

4.4.1. Comparing the Different Centrality Indices

For the purpose of analysis, we will focus on the top 10 entities that have the highest in-degree centrality indices.

Show the codes

# Extracting the top 10 entities with highest in-degree centrality
nodes_df_in_top <- nodes_df %>%
  filter(in_degree >= 600) %>%
  select(id, label, in_degree, in_bin, 
         eigen_score, eigen_bin, out_degree, out_bin) %>%
  arrange(desc(in_degree))

top_10_in <- head(nodes_df_in_top,10)
  
datatable(top_10_in, 
          class="stripe",
          caption = "\nTable 7: Top 10 Business Entities with High In-Degree Centrality Index\n",
          colnames = c("ID", "Entity", "In-Degree Centrality Index",
                       "In-Degree Centrality Category",
                       "Eigenvector Centrality Index",
                       "Eigenvector Centrality Category",
                       "Out-Degree Centrality Index",
                       "Out-Degree Centrality Category"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Observations

It is noted that the 4 entities that have high eigenvector centrality indices in Section 4.2.2. are also listed in Table 7 above. This confirms our earlier suggestion that these nodes are likely to be key transhipments (main collection centres) for the fishing vessels within the network.

4.4.2. In-Degree Centrality Network Graph

Show the codes

# get the entity names for the edges
edges_df_in <- edges_df %>%
  filter(from %in% top_10_in$id |
         to %in% top_10_in$id) %>%
  filter(weight > 80)

# build the nodes table
nodes_in <- nodes_df %>%
  filter(id %in% c(edges_df_in$from, edges_df_in$to)) %>%
  rename(group = in_bin) %>%
  arrange(label)

visNetwork(nodes_in,
           edges_df_in,
           main = "Entities with High In-Degree Centrality Index") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visLayout(randomSeed = 1234) %>%
  visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T),
             nodesIdSelection = TRUE) %>%
  visNodes(id = nodes_in$id, size=30) %>%
  visLegend(width = 0.1, position = "right", main = "Group") %>%
  visEdges(arrows = "to", 
           smooth = list(enabled = TRUE, 
                         type = "curvedCW"))

Observations

The Network Graph above reaffirms that these nodes are likely to be the transhipments for the fishing vessels, in view of the number of incoming links to them.

The graph above also highlighted one of the source provider nodes, “Adriatic Tuna Seabass BV Transit”, has a moderately high value of in-degree centrality index (the node in red).

4.4.3. Close Examination of “Adriatic Tuna Seabass BV Transit”

As part of a close examination of “Adriatic Tuna Seabass BV Transit”, we will extract out all the non-fishing related hscodes that involve this company as the source and target to further understand what are the resources that were being traded with the other entities. Once we extracted the relevant records, we will do a count of the number of edges with n() and analyse the top 10 records with the highest number of count.

Show the codes

## Adriatic as the Source
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
adriatic_source_details <- mc2_edges %>%
  filter(source == "Adriatic Tuna Seabass BV Transit") %>%
  filter(source != target) %>%
  group_by(target, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  distinct() %>%
  ungroup()

# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
adriatic_source_hscodes <- adriatic_source_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


## Adriatic as the Target
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
adriatic_target_details <- mc2_edges %>%
  filter(target == "Adriatic Tuna Seabass BV Transit") %>%
  filter(source != target) %>%
  group_by(source, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  arrange(desc(weight)) %>%
  distinct() %>%
  ungroup()


# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
adriatic_target_hscodes <- adriatic_target_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


# Presenting the top 10 records in a data table (Source) for analysis
datatable(head(adriatic_source_hscodes, 10), 
          class="stripe",
          caption = "\nTable 8: List of Business Entities with 'Adriatic' as the Source\n",
          colnames = c("Target Entity", "Weightkg", "HS Code",
                       "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Show the codes

# Presenting the top 10 records in a data table (Target) for analysis
datatable(head(adriatic_target_hscodes, 10), 
          class="stripe",
          caption = "\nTable 9:  List of Business Entities with 'Adriatic' as the Target\n",
          colnames = c("Source Entity", "Weightkg", "HS Code",
                       "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Observations

Adriatic as the Source Provider

Table 8 shows that list of Business Entities with ‘Adriatic’ as the Source Provider. From the list of HS Codes, we noted that ‘Adriatic Tuna Seabass BV Transit’ also provides other forms of consumables, particularly beancurd related products (for hscode == 210690) and mixtures of fruits, nuts and other edible parts of plants (for hscode == 200897). Based on the value of the Count column for these products, we conclude that the trades were unlikely to be related to IUU fishing.

Adriatic as the Target Recipient

Table 9 shows the list of Business Entities with ‘Adriatic’ as the Target Recipient. From the list of HS Codes, we noticed that ‘Adriatic Tuna Seabass BV Transit’ trades for a moderately high amount of pasta products (for hscode == 190219) and tableware and kitchenware products (for hscode == 848210). However, given that the values of the Count column were not extremely high, we conclude that the trades were unlikely to be related to IUU fishing.

4.5. Looking at the Out-Degree Centrality

Out-degree centrality is a measure of centrality in a network that focuses on outgoing connections from a node. Nodes with high out-degree centrality have a larger number of connections pointing away from them, indicating that they have a greater reach and influence over other nodes.

Distribution for Out-Degree Centrality Index
Code Chunk

# Plotting the Distribution for Out-Degree Centrality Index
nodes_df_out <- nodes_df %>%
  group_by(out_bin) %>%
  summarise(cnt = n())

# Create a bar chart with hover text using ggplot2
plot_in <- ggplot(nodes_df_out, aes(x = out_bin, y = cnt)) +
  geom_bar(stat = "identity", fill = "deepskyblue3") +
  geom_text(aes(label = cnt), vjust = -0.5, color = "black") +
  labs(x = "\nOut-Degree Centrality", y = "Count\n", 
       title = "Distribution of Out-Degree Centrality of the Nodes") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) 

plot_in

Observations from the Distribution of Out-Degree Centrality Index

Majority of the entities (99.8%) in the network have low number of out-degree centrality index, which suggest that they are likely to be the fishing vessels within the network, contributing to one of the major transhipment/fishery wharves/collection centres.

It is observed that there are 18 nodes with significantly high out-degree centrality index (>300), which we will study the top 10 nodes in more details.

4.5.1. Comparing the Different Centrality Indices

Show the codes

# Extracting the top companies with highest out-degree centrality
nodes_df_out <- nodes_df %>%
  filter(out_degree > 300) %>%
  select(id, label, out_degree, out_bin, degree_score, deg_bin,
         betweenness_score, bet_bin) %>%
  arrange(desc(out_degree))

top_10_out <- head(nodes_df_out,10)

datatable(top_10_out, 
          class="stripe",
          caption = "\nTable 10: Top 10 Business Entities with High Out-Degree Centrality Index\n",
          colnames = c("ID", "Entity", "Out-Degree Centrality Index",
                       "Out-Degree Centrality Category",
                       "Degree Centrality Index",
                       "Degree Centrality Category", 
                       "Betweenness Centrality Index",
                       "Betweenness Centrality Category"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                  targets="_all"))))

Observations

It is observed that the top 10 entities with high value of out-degree centrality index, have a high value for degree centrality and 2 of them also have a high value for betweenness centrality index. This suggested that these 2 entities (“The Salty Dog Limited Liability Company” and “Aqua Aura SE Marine life”) are likely to be the primary transhipment hubs within the network.

4.5.2. Out-Degree Centrality Network Graph

Show the codes

# get the company names for the edges
edges_df_out <- edges_df %>%
  filter(from %in% top_10_out$id |
         to %in% top_10_out$id) %>%
  filter(weight > 30)

# build the nodes table
nodes_out <- nodes_df %>%
  filter(id %in% c(edges_df_out$from, edges_df_out$to)) %>%
  rename(group = out_bin) %>%
  arrange(label)

visNetwork(nodes_out,
           edges_df_out,
           main = "Entities with High Out-Degree Centrality Index") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visLayout(randomSeed = 1234) %>%
  visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T),
             nodesIdSelection = TRUE) %>%
  visNodes(id = nodes_out$id, size=30) %>%
  visLegend(width = 0.1, position = "right", main = "Group") %>%
  visEdges(arrows = "to", 
           smooth = list(enabled = TRUE, 
                         type = "curvedCW"))

Observations

It is observed that while most of the nodes in this top 10 entities with high out-degree centrality indices, only have outgoing connections to the external nodes, the #1 entity at the top of the list, “The Salty Dog Limited Liability Company”, actually has an incoming connection from another node, “Marine Adventures Ltd. Corporation Brothers”. Hence, we will take a closer look at the exchanges between them.

4.5.3. Close Examination of “Marine Adventures Ltd. Corporation Brothers”

We will first of all, extract out all the exchanges between “Marine Adventures Ltd. Corporation Brothers” and “The Salty Dog Limited Liability Company”, that do not fall under the fishing related hscodes and perform a closer examination of the other products (if any) that these 2 entities are trading.

Show the codes

## Marine Adventures as the Source and Salty Dog as the Target
# Extract all the non-fishing related hscodes from the original list of edges (i.e. mc2_edges)
marine_source_details <- mc2_edges %>%
  filter(source == "Marine Adventures Ltd. Corporation Brothers" &
         target == "The Salty Dog Limited Liability Company") %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  distinct() %>%
  ungroup()

marine_source_details

# A tibble: 0 × 4
# ℹ 4 variables: weightkg <int>, hscode <chr>, year <dbl>, weight <int>

Observations

Based on the results, it is concluded that ‘Marine Adventures Ltd. Corporation Brothers’ and ‘The Salty Dog Limited Liability Company’ do not trade any other products that are not related to fishing.

4.6. Looking at the Betweenness Centrality

Betweenness centrality is a measure of the importance or influence of a node in a network based on its ability to connect other nodes. Nodes with high betweenness centrality act as bridges or intermediaries, connecting different parts of the network and have the potential to control the flow of information or influence within the network.

In the context of this exercise, nodes with high betweenness centrality might potentially be impact the network.

Distribution of Betweeness Centrality Index
Code Chunk

Show the codes

# Get the count of the records in each bin
nodes_df_bet_gp <- nodes_df %>%
  group_by(bet_bin) %>%
  summarise(cnt = n())

# Create a distribution with bar chart using ggplot2
plot_bet <- ggplot(nodes_df_bet_gp, aes(x = bet_bin, y = cnt)) +
  geom_bar(stat = "identity", fill = "deepskyblue3") +
  geom_text(aes(label = cnt), vjust = -0.5, color = "black") +
  labs(x = "\nBetweeenness Centrality", y = "Count\n", 
       title = "Distribution of Betweenness Centrality of the Nodes") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) 

plot_bet

Observations from the Distribution of Betweeness Centrality Index

The betweenness centrality distribution revealed that a large number of nodes has a low value and might potentially be the fishing vessels or small clusters without much influence to the entire network.

However, it is noted that there are 11 nodes with very high betweenness centrality index (>=500K), and we should take a closer look at them to understand their relationships with the other nodes.

4.6.1. Comparing the Different Centrality Indices

For the purpose of analysis, we will focus on the top 11 entities that have the highest betweenness centrality indices.

Show the codes

# Extract the Top 11 nodes with extremely high betweenness centrality index

nodes_df_bet_top11 <- nodes_df %>%
  filter(betweenness_score >= 500000) %>%
  select(id, label, betweenness_score, bet_bin, 
         in_degree, in_bin, out_degree, out_bin) %>%  
  arrange(desc(betweenness_score))

datatable(nodes_df_bet_top11, 
          class="stripe",
          caption = "\nTable 11: Top 11 Business Entities with High Betweenness Centrality Index\n",
          colnames = c("ID", "Entity", "Betweenness Centrality Index", 
                       "Betweenness Centrality Category", "In-Degree Centrality Index",
                       "In-Degree Centrality Category", "Out-Degree Centrality Index",
                       "Out-Degree Centrality Cateogy"),
          options = list(
            columnDefs = list(list(className = 'dt-center', targets = "_all"))))

Observations

Table 11 above revealed that 10 out of the 11 entities have low values for their in-degree and out-degree centrality indices.

‘David Ltd. Liability Co Forwading’ is the only entity with a h in-degree centrality index that is in the medium range of (300-599). We will take a closer look at this.

4.6.2. Betweenness Centrality Network Graph

Show the codes

# get the entity names for the edges
edges_df_bet <- edges_df %>%
  filter(from %in% nodes_df_bet_top11$id |
         to %in% nodes_df_bet_top11$id) %>%
  filter(weight > 20)

# build the nodes table
nodes_bet <- nodes_df %>%
  filter(id %in% c(edges_df_bet$from, edges_df_bet$to)) %>%
  rename(group = bet_bin) %>%
  arrange(label)

visNetwork(nodes_bet,
           edges_df_bet,
           main = "Entities with High Betweeness Centrality Index") %>%
  visIgraphLayout(layout = "layout_with_kk") %>%
  visLayout(randomSeed = 1234) %>%
  visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T),
             nodesIdSelection = TRUE) %>%
  visNodes(id = nodes_bet$id, size=30) %>%
  visLegend(width = 0.1, position = "right", main = "Group") %>%
  visEdges(arrows = "to", 
           smooth = list(enabled = TRUE))

Observations

The network graph above shows that there are 2 big clusters of nodes.

The image above shows that ‘Perla del Mar Deep-sea’ is the key node that connects the 2 nodes (‘Punjab s Marine Conservation’ and ‘David Ltd. Liability Co Forwading’) with high betweenness indices.

The second image shows that ‘Caracola del Sol Services’ is the key node that connects the 3 nodes (‘Marine Mates NV Worldwide’, ‘Yenisei Eel GmbH & Co. KG Services’ and ‘Turkish Salmon A/S Marine’) with high betweenness indices.

4.6.3. Close Examination of “Perla del Mar Deep-sea”

Show the codes

## Perla as the Source
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
perla_source_details <- mc2_edges %>%
  filter(source == "Perla del Mar Deep-sea") %>%
  filter(source != target) %>%
  group_by(target, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  distinct() %>%
  ungroup()

# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
perla_source_hscodes <- perla_source_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


## Perla as the Target
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
perla_target_details <- mc2_edges %>%
  filter(target == "Perla del Mar Deep-sea") %>%
  filter(source != target) %>%
  group_by(source, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  arrange(desc(weight)) %>%
  distinct() %>%
  ungroup()


# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
perla_target_hscodes <- perla_target_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


# Presenting the records in a data table (Source) for analysis
datatable(perla_source_hscodes, 
          class="stripe",
          caption = "\nTable 12: List of Business Entities with 'Perla' as the Source\n",
          colnames = c("Target Entity", "Weightkg", 
                       "HS Code", "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Show the codes

# Presenting the top 10 records in a data table (Target) for analysis
datatable(head(perla_target_hscodes,10), 
          class="stripe",
          caption = "\nTable 13:  List of Business Entities with 'Perla' as the Target\n",
          colnames = c("Source Entity", "Weightkg", "HS Code",
                       "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Observations

Perla as the Source Provider

Table 12 shows that list of Business Entities with ‘Perla’ as the Source Provider. From the list of HS Codes, we noted that ‘Perla del Mar Deep-sea’ also provides other variety of products such as rubber pneumatic tyres for tractors, machinery (hscode == 401180) upholstered seats (hscode == 940161) to other entities. Based on the value of the Count column for these products, we conclude that the trades were unlikely to be related to IUU fishing.

Perla as the Target Recipient

Table 13 shows the list of Business Entities with ‘Perla’ as the Target Recipient. From the list of HS Codes, we noticed that ‘Perla del Mar Deep-sea’ does not only accept fish and seafood, but also parts suitable for use solely or principally with spark-ignition internal combustion piston engines (for hscode == 840991) from ‘Costa del Sol Sp’, particularly in Year 2028 and 2029.

4.6.4. Close Examination of “Caracola del Sol Services”

Show the codes

## Caracola as the Source
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
caracol_source_details <- mc2_edges %>%
  filter(source == "Caracola del Sol Services") %>%
  filter(source != target) %>%
  group_by(target, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  distinct() %>%
  ungroup()

# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
caracol_source_hscodes <- caracol_source_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


## Caracola as the Target
# Extract all the list of hscodes from the original list of edges (i.e. mc2_edges)
caracol_target_details <- mc2_edges %>%
  filter(target == "Caracola del Sol Services") %>%
  filter(source != target) %>%
  group_by(source, weightkg, hscode, year) %>%
  summarise(weight = n()) %>%
  arrange(desc(weight)) %>%
  distinct() %>%
  ungroup()


# Extract the non-fishing related hscodes and year; and
# sort the records by the weight in descending order
caracol_target_hscodes <- caracol_target_details %>%
  filter(!(substr(hscode, 1, 3) %in% rel_hscodes_3) &
         !(substr(hscode, 1, 4) %in% rel_hscodes_4)) %>%
  group_by(weightkg, hscode, year, weight) %>%
  distinct() %>%
  arrange(desc(weight)) %>%
  ungroup()


# Presenting the records in a data table (Source) for analysis
datatable(head(caracol_source_hscodes,10), 
          class="stripe",
          caption = "\nTable 14: List of Business Entities with 'Caracola' as the Source\n",
          colnames = c("Target Entity", "Weightkg", 
                       "HS Code", "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Show the codes

# Presenting the top 10 records in a data table (Target) for analysis
datatable(head(caracol_target_hscodes,10), 
          class="stripe",
          caption = "\nTable 15:  List of Business Entities with 'Caracola' as the Target\n",
          colnames = c("Source Entity", "Weightkg", "HS Code",
                       "Year", "Count"),
          options = list(
            columnDefs = list(list(className = 'dt-center', 
                                   targets="_all"))))

Observations

Caracola as the Source Provider

Table 14 shows that list of Business Entities with ‘Caracola’ as the Source Provider. From the list of HS Codes, we noted that ‘Caracola del Sol Services’ also provides other variety of products such as rubber pneumatic tyres for tractors, machinery (hscode == 401180) upholstered seats (hscode == 940161) to other entities. Based on the value of the Count column for these products, we conclude that the trades were unlikely to be related to IUU fishing.

Caracola as the Target Recipient

Table 15 shows the list of Business Entities with ‘Caracola’ as the Target Recipient. From the list of HS Codes, we noticed that ‘Caracola del Sol Services’ does not only accept fish and seafood, but also a significant volumn of other products such as:

articles for the conveyance or packing of goods, of plastics; stoppers, lids, caps and other closures, of plastics (hscode == 392390)
aluminium alloys (hscode == 760120).

Since these items are not directly linked to IUU fishing activities, we conclude that the trades were unlikely to be related to IUU fishing.

5. Conclusion

Even though there were no suggested suspicious entities found to be involved in Illegal, Unreported and Unregulated (IUU) fishing, we have managed to identify the key transhipments in the network. It was also revealed that these transhipments do not necessary just receive the catches from the fishing vessels, but they also trade (either as a provider or recipient) other goods/products with these fishing vessels. These were discovered when we performed closer examinations of some of the key business entities identified in Sections 4.3 to 4.6.

1. Overview

1.1. The Task

1.2. Data Source

2. Loading and Launching of Required R Packages

3. Data Preparation

3.1. Loading and Extracting the Data

3.1.1. Load main file mc2_challenge_graph.json

3.1.1.1. Extracting the nodes data.frame from mc2

3.1.1.2. Extracting the edges (links) data.frame from mc2

3.1.1.3. Examining the structure of mc2_nodes and mc2_edges data.frames using glimpse() of dyplr.

3.1.1.4. Perform Data Wrangling for mc2_edges

3.1.1.5. Perform Data Wrangling for mc2_nodes

4. Data Exploration

4.1. Overview of the Network

4.2. The Important Nodes of the Network - Eigenvector Centrality

4.2.1. Comparing the Different Centrality Indices

4.2.2. Eigenvector Centrality Network Graph

4.3. Looking at the Degree of Centrality

4.3.1. Comparing the Different Centrality Indices

4.3.2. Degree Centrality Network Graph

4.3.3. Close Examination of “The Salty Dog Limited Liability Company”

4.3.4. Close Examination of “xiǎo xiā Oyj Marine ecology”

4.4. Looking at the In-Degree Centrality

4.4.1. Comparing the Different Centrality Indices

4.4.2. In-Degree Centrality Network Graph

4.4.3. Close Examination of “Adriatic Tuna Seabass BV Transit”

4.5. Looking at the Out-Degree Centrality

4.5.1. Comparing the Different Centrality Indices

4.5.2. Out-Degree Centrality Network Graph

4.5.3. Close Examination of “Marine Adventures Ltd. Corporation Brothers”

4.6. Looking at the Betweenness Centrality

4.6.1. Comparing the Different Centrality Indices

4.6.2. Betweenness Centrality Network Graph

4.6.3. Close Examination of “Perla del Mar Deep-sea”

4.6.4. Close Examination of “Caracola del Sol Services”

5. Conclusion

3.1.1.3. Examining the structure of mc2_nodes and mc2_edges data.frames using `glimpse()` of dyplr.