Re-creating famous pathways from SIGNOR and WIKIPATHWAYS using NeKo

[1]:

from neko.core.network import Network
from neko._visual.visualize_network import NetworkVisualizer
from neko.inputs import Universe, signor
import omnipath as op

1) Retrieving the MTOR Signaling from Signor using NeKo

[2]:

mtor_nodes = ["RPS6KA1", "SREBF1", "MTOR", "RPTOR", "INSR", "RPS6KB1", "RHEB", "EIF4EBP1", "INS", "PTEN",
             "RPS6", "TFEB", "PIK3R1", "PPARGC1A", "PDPK1", "AKT1S1", "PPARG", "PIK3CA", "EIF4E", "IRS1", "GSK3B", "ULK1"]

[3]:

import random

[4]:

seeds_number = 4
random_seeds = random.sample(mtor_nodes, seeds_number)

[5]:

random_seeds

[5]:

['AKT1S1', 'TFEB', 'SREBF1', 'RPS6']

[6]:

resources = Universe()
resources = signor("../neko/_data/signor_db.tsv")  # this function accept only tab separated values
resources.build()

Let’s compare the function commplete_connection (based on the Reciprocal Pathway Extender algorithm) with the function connect_radially (based on the Iterative Neighbor Expansion ). We are going to create two neko_net, and apply to each of them one of the two functions. Finally, we are going to compare the resulting networks.

[7]:

neko_net1 = Network(random_seeds, resources = resources.interactions)

[8]:

neko_net2 = Network(random_seeds, resources = resources.interactions)

[9]:

%%time
neko_net1.complete_connection(maxlen=3, algorithm="dfs", only_signed=True, connect_with_bias=False, consensus=False)

CPU times: user 402 ms, sys: 3.1 ms, total: 406 ms
Wall time: 404 ms

Note

The max_len in the function connect_network_radially should be kept at 1, max 2. Why?

The Iterative Neighbour Expansion, as the name suggests, iterates through all the seed nodes and adds to the network all the interactions found. In the next step, it iterates through all the neighbors found and looks for their neighbors. Doing so, the Network size could exponentially increase, if among the neighbor nodes there are some HUB (nodes with a high degree of connection).

[10]:

%%time
neko_net2.connect_network_radially(max_len=2, only_signed=True, consensus=False)

CPU times: user 6.45 s, sys: 1.39 ms, total: 6.45 s
Wall time: 6.45 s

Now let’s visualize the network:

[11]:

#Visualize network
visualizer1 = NetworkVisualizer(neko_net1, color_by='effect', noi=True)
visualizer1.render("./img/Complete_connection_neko_net_sample", view=True)

[12]:

#Visualize network
visualizer2 = NetworkVisualizer(neko_net2, color_by='effect', noi=True)
visualizer2.render("./img/Radial_neko_net_sample", view=True)

Let’s compare the Networks. We will use JUST those nodes in signor that are not complexes or protein families (so we excluded all those nodes that starts with “SIGNOR_”). At the following link you can find the full MTOR pathway from SIGNOR: https://signor.uniroma2.it/pathway_browser.php?beta=3.0&organism=&pathway_list=SIGNOR-MS&x=13&y=13

[13]:

net1_nodes_size = len(neko_net1.nodes)
net2_nodes_size = len(neko_net2.nodes)

print("Number of nodes of the first NeKo network: ", net1_nodes_size)
print("Number of nodes of the second NeKo network: ", net2_nodes_size)

Number of nodes of the first NeKo network:  13
Number of nodes of the second NeKo network:  58

[14]:

net1_edges_size = len(neko_net1.edges)
net2_edges_size = len(neko_net2.edges)

print("Number of edges of the first NeKo network: ", net1_edges_size)
print("Number of edges of the second NeKo network: ", net2_edges_size)

Number of edges of the first NeKo network:  33
Number of edges of the second NeKo network:  161

The first consideration to do is that the RPE algorithm, is way faster than the INE one (402 ms vs 6.45 s). Despite being slower, the INE algorithm, provided a much bigger network (13 nodes vs 33, 58 edges vs 161)

[15]:

nodes_found = []
for node in mtor_nodes:
    if node in list(neko_net1.nodes["Genesymbol"]):
        nodes_found.append(node)

print("Initial nodes: ", random_seeds)
print("Nodes in the MTOR pathways: ", mtor_nodes)
print("Nodes founded: ", nodes_found)

Initial nodes:  ['AKT1S1', 'TFEB', 'SREBF1', 'RPS6']
Nodes in the MTOR pathways:  ['RPS6KA1', 'SREBF1', 'MTOR', 'RPTOR', 'INSR', 'RPS6KB1', 'RHEB', 'EIF4EBP1', 'INS', 'PTEN', 'RPS6', 'TFEB', 'PIK3R1', 'PPARGC1A', 'PDPK1', 'AKT1S1', 'PPARG', 'PIK3CA', 'EIF4E', 'IRS1', 'GSK3B', 'ULK1']
Nodes founded:  ['SREBF1', 'MTOR', 'RPS6KB1', 'RPS6', 'TFEB', 'AKT1S1']

[16]:

print("Percentage of genes covered: ", (len(nodes_found)/len(mtor_nodes)) * 100)

Percentage of genes covered:  27.27272727272727

[17]:

nodes_found = []
for node in mtor_nodes:
    if node in list(neko_net2.nodes["Genesymbol"]):
        nodes_found.append(node)

print("Initial nodes: ", random_seeds)
print("Nodes in the MTOR pathways: ", mtor_nodes)
print("Nodes founded: ", nodes_found)

Initial nodes:  ['AKT1S1', 'TFEB', 'SREBF1', 'RPS6']
Nodes in the MTOR pathways:  ['RPS6KA1', 'SREBF1', 'MTOR', 'RPTOR', 'INSR', 'RPS6KB1', 'RHEB', 'EIF4EBP1', 'INS', 'PTEN', 'RPS6', 'TFEB', 'PIK3R1', 'PPARGC1A', 'PDPK1', 'AKT1S1', 'PPARG', 'PIK3CA', 'EIF4E', 'IRS1', 'GSK3B', 'ULK1']
Nodes founded:  ['RPS6KA1', 'SREBF1', 'MTOR', 'RPS6KB1', 'PTEN', 'RPS6', 'TFEB', 'PPARGC1A', 'AKT1S1', 'PPARG', 'GSK3B', 'ULK1']

[18]:

print("Percentage of genes covered: ", (len(nodes_found)/len(mtor_nodes)) * 100)

Percentage of genes covered:  54.54545454545454

As expected, due to the network size, the INE algorithm managed to capture more genes belonging to the MTOR pathway compared to the RPE algorithm. This was expected also because the RPE algorithm aims at finding the minimal set of genes that can connect all the seed nodes given by the user, while the INE algorithm does not take into account the shortest paths, but homogeneously explores the surroundings.

2) Retrieving the EGF/EGFR pathway (source: wikipathway) using Omnipath

The MTOR pathways we saw in use case 2 is a reletavely small pathway. We decided to test the INE and RPE algorithm to retrieve a bigger one, the EGF/EGFR pathway as shown in wikipathway, fetching interactions from Omnipath.

In order to do so, we can proceed with the installation of the python package pywikipathways to quickly retrieve the genes belonging to the EGF/EGFR pathway (WP437, https://www.wikipathways.org/pathways/WP437.html).

[19]:

# uncomment the following line if you do not have installed pywikipathways
!pip install pywikipathways

Collecting pywikipathways
  Using cached pywikipathways-0.0.3-py3-none-any.whl.metadata (2.5 kB)
Requirement already satisfied: lxml in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from pywikipathways) (5.3.0)
Requirement already satisfied: pandas in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from pywikipathways) (2.2.2)
Requirement already satisfied: requests in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from pywikipathways) (2.32.3)
Requirement already satisfied: numpy>=1.26.0 in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from pandas->pywikipathways) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from pandas->pywikipathways) (3.9.0)
Requirement already satisfied: pytz>=2020.1 in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from pandas->pywikipathways) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from pandas->pywikipathways) (2024.1)
Requirement already satisfied: six>=1.5 in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas->pywikipathways) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from requests->pywikipathways) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from requests->pywikipathways) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from requests->pywikipathways) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in /home/mruscone/Desktop/github/Neko/test_env/lib/python3.12/site-packages (from requests->pywikipathways) (2024.7.4)
Using cached pywikipathways-0.0.3-py3-none-any.whl (12 kB)
Installing collected packages: pywikipathways
Successfully installed pywikipathways-0.0.3

[20]:

import pywikipathways as pwpw

[21]:

pwpw.get_pathway_info('WP437')

[21]:

{'id': 'WP437',
 'url': 'https://classic.wikipathways.org/index.php/Pathway:WP437',
 'name': 'EGF/EGFR signaling',
 'species': 'Homo sapiens',
 'revision': '137261'}

[22]:

egf_egfr_genes = pwpw.get_xref_list('WP437','H')

[23]:

print(len(egf_egfr_genes))
egf_egfr_genes

[23]:

['ABI1',
 'ABL1',
 'AKT1',
 'AP2A1',
 'AP2B1',
 'AP2M1',
 'AP2S1',
 'ARF6',
 'ARHGEF1',
 'ASAP1',
 'ATF1',
 'ATXN2',
 'AURKA',
 'BCAR1',
 'BRAF',
 'CAMK2A',
 'CAV1',
 'CAV2',
 'CBL',
 'CBLB',
 'CBLC',
 'CDC42',
 'CFL1',
 'CREB1',
 'CRK',
 'CRKL',
 'CSK',
 'DNM1',
 'DOK2',
 'E2F1',
 'EGF',
 'EGFR',
 'EIF4EBP1',
 'ELK1',
 'ELK4',
 'EPN1',
 'EPS15',
 'EPS15L1',
 'EPS8',
 'ERBB2',
 'ERRFI1',
 'FOS',
 'FOSB',
 'FOXO1',
 'FOXO4',
 'GAB1',
 'GAB2',
 'GJA1',
 'GRB10',
 'GRB2',
 'HGS',
 'HRAS',
 'INPP5D',
 'INPPL1',
 'IQGAP1',
 'IQSEC1',
 'ITCH',
 'JAK1',
 'JAK2',
 'JUN',
 'JUND',
 'KRAS',
 'LIMK2',
 'MAP2K1',
 'MAP2K2',
 'MAP2K5',
 'MAP3K1',
 'MAP3K2',
 'MAP3K3',
 'MAP3K4',
 'MAP4K1',
 'MAPK1',
 'MAPK14',
 'MAPK4',
 'MAPK7',
 'MAPK8',
 'MAPK9',
 'MEF2A',
 'MEF2C',
 'MEF2D',
 'MT-CO2',
 'MTOR',
 'MYBL2',
 'NCK1',
 'NCK2',
 'NCOA3',
 'NDUFA13',
 'NEDD4',
 'NEDD8',
 'NOS3',
 'PAK1',
 'PCNA',
 'PDPK1',
 'PEBP1',
 'PIAS3',
 'PIK3C2B',
 'PIK3R1',
 'PIK3R2',
 'PLCE1',
 'PLCG1',
 'PLD1',
 'PLD2',
 'PLSCR1',
 'PRKCA',
 'PRKCB',
 'PRKCD',
 'PRKCI',
 'PRKCZ',
 'PTEN',
 'PTK2',
 'PTK2B',
 'PTK6',
 'PTPN11',
 'PTPN12',
 'PTPN5',
 'PTPRR',
 'PXDN',
 'RAB5A',
 'RAC1',
 'RAF1',
 'RALA',
 'RALB',
 'RALBP1',
 'RALGDS',
 'RAP1A',
 'RASA1',
 'REPS2',
 'RICTOR',
 'RIN1',
 'ROCK1',
 'RPS6KA1',
 'RPS6KA2',
 'RPS6KA3',
 'RPS6KA5',
 'RPS6KB1',
 'SH2D2A',
 'SH3GL2',
 'SH3GL3',
 'SH3KBP1',
 'SHC1',
 'SOS1',
 'SOS2',
 'SP1',
 'SPRY2',
 'SRC',
 'STAM',
 'STAM2',
 'STAMBP',
 'STAT1',
 'STAT3',
 'STAT5A',
 'STAT5B',
 'STMN1',
 'STXBP1',
 'SYNJ1',
 'TNK2',
 'TWIST1',
 'USP6NL',
 'USP8',
 'VAV1',
 'VAV2',
 'VAV3']

Let’s select a random number of those genes and proceed with building the network with NeKo!

[24]:

seeds_number = 20
random_seeds = random.sample(egf_egfr_genes, seeds_number)

[25]:

random_seeds

[25]:

['PEBP1',
 'SOS1',
 'GAB2',
 'CDC42',
 'CBLC',
 'EIF4EBP1',
 'HGS',
 'RICTOR',
 'PTPRR',
 'GRB2',
 'RAF1',
 'RAC1',
 'MTOR',
 'ELK4',
 'RPS6KA5',
 'EPS15L1',
 'USP6NL',
 'PIK3R1',
 'ABL1',
 'PTK2']

TIP

NeKo provides some built-in functions to easily plug some well-known databases, like Omnipath, Signor, PhosphositePlus and Huri. More information can be found in the Notebook #2

[26]:

neko_net3 = Network(random_seeds, resources = 'omnipath')

[27]:

neko_net4 = Network(random_seeds, resources = 'omnipath')

Once again, now that we have created the NeKo network, let’s use the RPE and INE algorithm to retrieve (hopefully) the full EGF/EGFR pathway.

To Notice!

Since the AllOmnipath database is very big and we have a higher number of seed nodes, the computational time / cost will be higher too! In some cases, expect both complete_connection and connect_network_radially to take minutes!

[28]:

%%time
neko_net3.complete_connection(maxlen=3, algorithm="dfs", only_signed=True, connect_with_bias=False, consensus=False)

CPU times: user 3min 18s, sys: 24.7 ms, total: 3min 18s
Wall time: 3min 18s

[29]:

%%time
neko_net4.connect_network_radially(max_len=1, only_signed=True, consensus=False)

CPU times: user 41.7 s, sys: 3.03 ms, total: 41.7 s
Wall time: 41.7 s

[30]:

#This time the network are very big and it can be difficult to visualize them
#visualizer3 = NetworkVisualizer(neko_net3, color_by='effect', noi=True)
#visualizer3.render("./img/Complete_connection_neko_net_sample_EGF", view=True)

[31]:

#Visualize network
#visualizer4 = NetworkVisualizer(neko_net4, color_by='effect', noi=True)
#visualizer4.render("./img/Radial_neko_net_sample_EGF", view=True)

As we did previously, let’s compare network’s size and check if we found nodes beloning to the EGF/EGFR pathway from wikipathway

[32]:

net3_nodes_size = len(neko_net3.nodes)
net4_nodes_size = len(neko_net4.nodes)

print("Number of nodes of the third NeKo network: ", net3_nodes_size)
print("Number of nodes of the fourth NeKo network: ", net4_nodes_size)

Number of nodes of the third NeKo network:  284
Number of nodes of the fourth NeKo network:  273

[33]:

net3_edges_size = len(neko_net3.edges)
net4_edges_size = len(neko_net4.edges)

print("Number of edges of the third NeKo network: ", net3_edges_size)
print("Number of edges of the fourth NeKo network: ", net4_edges_size)

Number of edges of the third NeKo network:  5109
Number of edges of the fourth NeKo network:  1107

[34]:

nodes_found = []
for node in egf_egfr_genes:
    if node in list(neko_net3.nodes["Genesymbol"]):
        nodes_found.append(node)

print("Initial nodes: ", random_seeds)
print("Nodes in the EGF/EGFR pathways: ", egf_egfr_genes)
print("Nodes founded: ", nodes_found)

Initial nodes:  ['PEBP1', 'SOS1', 'GAB2', 'CDC42', 'CBLC', 'EIF4EBP1', 'HGS', 'RICTOR', 'PTPRR', 'GRB2', 'RAF1', 'RAC1', 'MTOR', 'ELK4', 'RPS6KA5', 'EPS15L1', 'USP6NL', 'PIK3R1', 'ABL1', 'PTK2']
Nodes in the EGF/EGFR pathways:  ['ABI1', 'ABL1', 'AKT1', 'AP2A1', 'AP2B1', 'AP2M1', 'AP2S1', 'ARF6', 'ARHGEF1', 'ASAP1', 'ATF1', 'ATXN2', 'AURKA', 'BCAR1', 'BRAF', 'CAMK2A', 'CAV1', 'CAV2', 'CBL', 'CBLB', 'CBLC', 'CDC42', 'CFL1', 'CREB1', 'CRK', 'CRKL', 'CSK', 'DNM1', 'DOK2', 'E2F1', 'EGF', 'EGFR', 'EIF4EBP1', 'ELK1', 'ELK4', 'EPN1', 'EPS15', 'EPS15L1', 'EPS8', 'ERBB2', 'ERRFI1', 'FOS', 'FOSB', 'FOXO1', 'FOXO4', 'GAB1', 'GAB2', 'GJA1', 'GRB10', 'GRB2', 'HGS', 'HRAS', 'INPP5D', 'INPPL1', 'IQGAP1', 'IQSEC1', 'ITCH', 'JAK1', 'JAK2', 'JUN', 'JUND', 'KRAS', 'LIMK2', 'MAP2K1', 'MAP2K2', 'MAP2K5', 'MAP3K1', 'MAP3K2', 'MAP3K3', 'MAP3K4', 'MAP4K1', 'MAPK1', 'MAPK14', 'MAPK4', 'MAPK7', 'MAPK8', 'MAPK9', 'MEF2A', 'MEF2C', 'MEF2D', 'MT-CO2', 'MTOR', 'MYBL2', 'NCK1', 'NCK2', 'NCOA3', 'NDUFA13', 'NEDD4', 'NEDD8', 'NOS3', 'PAK1', 'PCNA', 'PDPK1', 'PEBP1', 'PIAS3', 'PIK3C2B', 'PIK3R1', 'PIK3R2', 'PLCE1', 'PLCG1', 'PLD1', 'PLD2', 'PLSCR1', 'PRKCA', 'PRKCB', 'PRKCD', 'PRKCI', 'PRKCZ', 'PTEN', 'PTK2', 'PTK2B', 'PTK6', 'PTPN11', 'PTPN12', 'PTPN5', 'PTPRR', 'PXDN', 'RAB5A', 'RAC1', 'RAF1', 'RALA', 'RALB', 'RALBP1', 'RALGDS', 'RAP1A', 'RASA1', 'REPS2', 'RICTOR', 'RIN1', 'ROCK1', 'RPS6KA1', 'RPS6KA2', 'RPS6KA3', 'RPS6KA5', 'RPS6KB1', 'SH2D2A', 'SH3GL2', 'SH3GL3', 'SH3KBP1', 'SHC1', 'SOS1', 'SOS2', 'SP1', 'SPRY2', 'SRC', 'STAM', 'STAM2', 'STAMBP', 'STAT1', 'STAT3', 'STAT5A', 'STAT5B', 'STMN1', 'STXBP1', 'SYNJ1', 'TNK2', 'TWIST1', 'USP6NL', 'USP8', 'VAV1', 'VAV2', 'VAV3']
Nodes founded:  ['ABL1', 'AKT1', 'BRAF', 'CAMK2A', 'CAV1', 'CBL', 'CBLB', 'CBLC', 'CDC42', 'CREB1', 'E2F1', 'EGF', 'EGFR', 'EIF4EBP1', 'ELK4', 'EPN1', 'EPS15', 'EPS15L1', 'ERBB2', 'FOS', 'GAB1', 'GAB2', 'GRB10', 'GRB2', 'HGS', 'HRAS', 'IQGAP1', 'JAK1', 'JAK2', 'KRAS', 'LIMK2', 'MAP2K1', 'MAP3K1', 'MAPK1', 'MAPK14', 'MAPK7', 'MAPK8', 'MAPK9', 'MTOR', 'NCK1', 'PAK1', 'PDPK1', 'PEBP1', 'PIK3R1', 'PIK3R2', 'PLCG1', 'PLD1', 'PRKCA', 'PRKCB', 'PRKCD', 'PRKCZ', 'PTEN', 'PTK2', 'PTK2B', 'PTPN11', 'PTPRR', 'RAC1', 'RAF1', 'RAP1A', 'RASA1', 'RICTOR', 'RPS6KA2', 'RPS6KA5', 'RPS6KB1', 'SH3GL2', 'SHC1', 'SOS1', 'SRC', 'STAT3', 'USP6NL', 'VAV1', 'VAV2', 'VAV3']

[35]:

print("Percentage of genes covered: ", (len(nodes_found)/len(egf_egfr_genes)) * 100)

Percentage of genes covered:  45.06172839506173

[36]:

nodes_found = []
for node in egf_egfr_genes:
    if node in list(neko_net4.nodes["Genesymbol"]):
        nodes_found.append(node)

print("Initial nodes: ", random_seeds)
print("Nodes in the EGF/EGFR pathways: ", egf_egfr_genes)
print("Nodes founded: ", nodes_found)

Initial nodes:  ['PEBP1', 'SOS1', 'GAB2', 'CDC42', 'CBLC', 'EIF4EBP1', 'HGS', 'RICTOR', 'PTPRR', 'GRB2', 'RAF1', 'RAC1', 'MTOR', 'ELK4', 'RPS6KA5', 'EPS15L1', 'USP6NL', 'PIK3R1', 'ABL1', 'PTK2']
Nodes in the EGF/EGFR pathways:  ['ABI1', 'ABL1', 'AKT1', 'AP2A1', 'AP2B1', 'AP2M1', 'AP2S1', 'ARF6', 'ARHGEF1', 'ASAP1', 'ATF1', 'ATXN2', 'AURKA', 'BCAR1', 'BRAF', 'CAMK2A', 'CAV1', 'CAV2', 'CBL', 'CBLB', 'CBLC', 'CDC42', 'CFL1', 'CREB1', 'CRK', 'CRKL', 'CSK', 'DNM1', 'DOK2', 'E2F1', 'EGF', 'EGFR', 'EIF4EBP1', 'ELK1', 'ELK4', 'EPN1', 'EPS15', 'EPS15L1', 'EPS8', 'ERBB2', 'ERRFI1', 'FOS', 'FOSB', 'FOXO1', 'FOXO4', 'GAB1', 'GAB2', 'GJA1', 'GRB10', 'GRB2', 'HGS', 'HRAS', 'INPP5D', 'INPPL1', 'IQGAP1', 'IQSEC1', 'ITCH', 'JAK1', 'JAK2', 'JUN', 'JUND', 'KRAS', 'LIMK2', 'MAP2K1', 'MAP2K2', 'MAP2K5', 'MAP3K1', 'MAP3K2', 'MAP3K3', 'MAP3K4', 'MAP4K1', 'MAPK1', 'MAPK14', 'MAPK4', 'MAPK7', 'MAPK8', 'MAPK9', 'MEF2A', 'MEF2C', 'MEF2D', 'MT-CO2', 'MTOR', 'MYBL2', 'NCK1', 'NCK2', 'NCOA3', 'NDUFA13', 'NEDD4', 'NEDD8', 'NOS3', 'PAK1', 'PCNA', 'PDPK1', 'PEBP1', 'PIAS3', 'PIK3C2B', 'PIK3R1', 'PIK3R2', 'PLCE1', 'PLCG1', 'PLD1', 'PLD2', 'PLSCR1', 'PRKCA', 'PRKCB', 'PRKCD', 'PRKCI', 'PRKCZ', 'PTEN', 'PTK2', 'PTK2B', 'PTK6', 'PTPN11', 'PTPN12', 'PTPN5', 'PTPRR', 'PXDN', 'RAB5A', 'RAC1', 'RAF1', 'RALA', 'RALB', 'RALBP1', 'RALGDS', 'RAP1A', 'RASA1', 'REPS2', 'RICTOR', 'RIN1', 'ROCK1', 'RPS6KA1', 'RPS6KA2', 'RPS6KA3', 'RPS6KA5', 'RPS6KB1', 'SH2D2A', 'SH3GL2', 'SH3GL3', 'SH3KBP1', 'SHC1', 'SOS1', 'SOS2', 'SP1', 'SPRY2', 'SRC', 'STAM', 'STAM2', 'STAMBP', 'STAT1', 'STAT3', 'STAT5A', 'STAT5B', 'STMN1', 'STXBP1', 'SYNJ1', 'TNK2', 'TWIST1', 'USP6NL', 'USP8', 'VAV1', 'VAV2', 'VAV3']
Nodes founded:  ['ABI1', 'ABL1', 'AKT1', 'ARF6', 'ASAP1', 'BCAR1', 'CAMK2A', 'CAV1', 'CBL', 'CBLB', 'CBLC', 'CDC42', 'CRK', 'CRKL', 'CSK', 'EGF', 'EGFR', 'EIF4EBP1', 'ELK4', 'EPN1', 'EPS15', 'EPS15L1', 'GAB1', 'GAB2', 'GRB10', 'GRB2', 'HGS', 'HRAS', 'INPP5D', 'IQGAP1', 'JAK1', 'JAK2', 'KRAS', 'MAP2K1', 'MAP3K4', 'MAPK1', 'MAPK14', 'MAPK7', 'MAPK8', 'MTOR', 'NCK1', 'NCK2', 'PAK1', 'PEBP1', 'PIK3R1', 'PIK3R2', 'PLCG1', 'PLD1', 'PRKCA', 'PRKCD', 'PRKCZ', 'PTEN', 'PTK2', 'PTK2B', 'PTPN11', 'PTPN12', 'PTPRR', 'RAC1', 'RAF1', 'RALGDS', 'RAP1A', 'RASA1', 'RICTOR', 'RIN1', 'ROCK1', 'RPS6KA5', 'RPS6KB1', 'SH3GL2', 'SHC1', 'SOS1', 'SRC', 'USP6NL', 'VAV1', 'VAV2', 'VAV3']

[37]:

print("Percentage of genes covered: ", (len(nodes_found)/len(egf_egfr_genes)) * 100)

Percentage of genes covered:  46.2962962962963

Once more, the INE captured better the pathway composition, recovering 40-50% of the genes in the EGF/EGFR pathway from wikipathways (by starting from 10%). This time, INE performed better than RPE. The reason for this resides within the database composition. The Omnipath database contains tons of interaction, which means, many possibilities to go from gene A to gene B in the shortest path possible (with max len = 3). On the other hand, INE runs with max_len = 1, so it just collects all the possible neighbors of the nodes, without looking for extended extra pathways.

Please remember that the aim of the RPE algorithm is to reduce as much as possible the average network distance. On the other hand, the INE algorithm does not care about it.