Build network using user-defined resources

This notebook explores how to add other individual resources from Omnipath or other public databases.

[35]:
%%time
from neko.core.network import Network
from neko._visual.visualize_network import NetworkVisualizer
from neko.inputs import Universe, signor
import omnipath as op
import pandas as pd
CPU times: user 14 µs, sys: 1 µs, total: 15 µs
Wall time: 16.9 µs

1. Adding a resource already in OmniPath

1A. Specify the interaction resource of interest

[36]:
collectri = op.interactions.CollecTRI.get()

1B. Add new resource to the Resources object

[37]:
resources = Universe()
resources.add_resources(collectri, name = 'collectri')
resources.build()
[38]:
resources.interactions
[38]:
source target is_directed is_stimulation is_inhibition consensus_direction consensus_stimulation consensus_inhibition curation_effort references sources n_sources n_primary_sources n_references references_stripped form_complex
0 P01106 O14746 False True False True True False 82 CollecTRI:10022128;CollecTRI:10491298;CollecTR... CollecTRI;DoRothEA-A_CollecTRI;ExTRI_CollecTRI... 8 1 74 10022128;10491298;10606235;10637317;10723141;1... False
1 P17947 P02818 False True False True True False 3 CollecTRI:10022617 CollecTRI;ExTRI_CollecTRI 2 1 1 10022617 False
2 COMPLEX:P15407_P17275 P05412 False True False True True False 53 CollecTRI:10022869;CollecTRI:10037172;CollecTR... CollecTRI;ExTRI_CollecTRI;NTNU.Curated_CollecT... 4 1 49 10022869;10037172;10208431;10366004;11281649;1... False
3 COMPLEX:P01100_P05412 P05412 False True False True True False 53 CollecTRI:10022869;CollecTRI:10037172;CollecTR... CollecTRI;ExTRI_CollecTRI;NTNU.Curated_CollecT... 4 1 49 10022869;10037172;10208431;10366004;11281649;1... False
4 COMPLEX:P01100_P17275 P05412 False True False True True False 53 CollecTRI:10022869;CollecTRI:10037172;CollecTR... CollecTRI;ExTRI_CollecTRI;NTNU.Curated_CollecT... 4 1 49 10022869;10037172;10208431;10366004;11281649;1... False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
64490 Q01196 Q13094 False True False True True False 3 CollecTRI:20019798 CollecTRI;DoRothEA-A_CollecTRI 2 1 1 20019798 False
64491 Q01196 Q6MZQ0 False True False True True False 3 CollecTRI:20019798 CollecTRI;DoRothEA-A_CollecTRI 2 1 1 20019798 False
64492 Q15672 P08151 False True False True True False 3 CollecTRI:11948912 CollecTRI;DoRothEA-A_CollecTRI 2 1 1 11948912 False
64493 P22415 Q5SRE5 False True False True True False 3 CollecTRI:22951020 CollecTRI;DoRothEA-A_CollecTRI 2 1 1 22951020 False
64494 Q9UQR1 Q5VYX0 False True False True True False 3 CollecTRI:25295465 CollecTRI;DoRothEA-A_CollecTRI 2 1 1 25295465 False

64495 rows × 16 columns

It is important once added a new resource, to build the database as showcases in the previous cell. All those steps assure that the incoming database is compatible with the NeKo structure, possibly avoiding mistakes when running the connecting algorithms.

Sometimes a WARNING can suggest that some interactions could be missing; while this is limiting the amount of knowledge NeKo can extract, it does not prevent the package from working. Other possible cause of a WARNING is the absence of “consensus” columns. To avoid errors while running the package, set always consensus to False.

2. Adding a public database

Alternatively, the user might want to use their own interaction databases. We have already implemented the inclusion of some of widely used databases.

As example, we show here how to integrate the Signor 3.0 database. In order to do so, the User needs to have already downloaded the whole Signor database, available at the following link: https://signor.uniroma2.it/downloads.php or with the following code:.

2A. Add Signor database

[39]:
resources = Universe()
resources = signor("../neko/_data/signor_db.tsv")  # this function accept only tab separated values
resources.build()
[40]:
resources.interactions
[40]:
source target is_directed is_stimulation is_inhibition form_complex consensus_direction consensus_stimulation consensus_inhibition curation_effort references sources
0 A0A024RAD5 SIGNOR-C535 True False False True False False False miannu 31831667 SIGNOR-272062
1 A0A0B4J2F0 P18848 True False True False False False False miannu 31653868 SIGNOR-261041
2 A0A0B4J2F0 P35638 True False True False False False False miannu 31653868 SIGNOR-261043
3 A0A0B4J2F0 SIGNOR-PH2 True False True False False False False miannu 31653868 SIGNOR-261042
4 A0AVT1 SIGNOR-C496 True False False True False False False miannu 24816100 SIGNOR-270835
... ... ... ... ... ... ... ... ... ... ... ... ...
28850 URS000075C808_9606 Q6ZN04 True True False False False False False miannu 24326307 SIGNOR-272092
28851 URS000075C808_9606 Q86Y13 True True False False False False False miannu 24326307 SIGNOR-272091
28852 URS000075CF56_9606 P05019 True False True False False False False miannu 25477897 SIGNOR-255793
28853 URS000075CF56_9606 P23759 True False True False False False False irozzo 24708856 SIGNOR-256124
28854 URS000075D8A0_9606 P61073 True True False False False False False Luisa 20516212 SIGNOR-268952

28855 rows × 12 columns

In order to avoid conflict or errors, ensure that the file contains at least the following columns: IDA, IDB, EFFECT, ANNOTATOR, PMID, SIGNOR_ID

Note

SIGNOR uses has different identifiers for complexes, protein family, phenotype, etc. It is possible that the network contains nodes names like “Signor_pf32” or something like this… This is not yet translated, but if you are interested in what those nodes consist of, you can use download their vocabulary for the entities: https://signor.uniroma2.it/downloads.php

3. Build the network

3A. Import genes as network nodes

[41]:
genes = ["SRC", "NOTCH1", "PTK2", "CDH1", "CDH2", "VIM", "MAP4K4", "LATS1", "LATS2"]

3B. Create network object by specifying the interaction resources

[42]:
new_net1 = Network(genes, resources = resources.interactions)
[43]:
#Print node dataframe
new_net1.nodes
[43]:
Genesymbol Uniprot Type
0 SRC P12931 NaN
1 NOTCH1 P46531 NaN
2 PTK2 Q05397 NaN
3 CDH1 P12830 NaN
4 CDH2 P19022 NaN
5 VIM P08670 NaN
6 MAP4K4 O95819 NaN
7 LATS1 O95835 NaN
8 LATS2 Q9NRM7 NaN

3C. Build network

The downstream steps to connect your nodes, are the same. Please see the Network building tutorial for detailed explanations of each step.

[44]:
%%time
new_net1.connect_nodes(only_signed=True, consensus_only=False)
CPU times: user 24.8 ms, sys: 945 µs, total: 25.8 ms
Wall time: 25.2 ms
[45]:
visualizer = NetworkVisualizer(new_net1, color_by='effect', noi=True)
visualizer.render()
../_images/notebooks_2_add_resources_24_0.svg

Note

NB! It is important to note, that not all the databases have the same structure as Omnipath. In particular, if the columns “consensus” are missing, we suggest to avoid to use the flag consensus or consensus_only when using NeKo with customize databases. As a general advise, we suggest to always use consensus=False when you are using a database different from the Omnipath.

[46]:
%%time
new_net1.complete_connection(maxlen=3, algorithm="bfs", only_signed=True, connect_with_bias=False, consensus=False)
CPU times: user 2.59 s, sys: 112 ms, total: 2.7 s
Wall time: 2.7 s
[47]:
#Visualize network
visualizer1 = NetworkVisualizer(new_net1, color_by='effect', noi=True)
visualizer1.render()
../_images/notebooks_2_add_resources_27_0.svg

4. Translate IDs

NeKo’s resource object relies on UniProt IDs. In cases where the user-defined database does not have UniProt IDs, NeKo offers the function to translate between different IDs. The translation module is based on the python package “Unipressed” (https://github.com/multimeric/Unipressed).

In the example below, we use the HURI database (http://www.interactome-atlas.org/) which provides protein-protein interactions (ppi) with ENSEMBL IDs.

To run the following example, please download the database at the following link: http://www.interactome-atlas.org/download. The example uses the HI-Union version of the database in .tsv format.

[13]:
#Import the module
from neko.inputs.db_translator import IDTranslator

The IDTranslator function receives as arguments the input and output file names, together with the existing IDs and the target ones (in that case Ensembl to UniProtKB IDs).

Note

NB! The IDTranslator can take several minutes the first time you translate a database, depending on the size of the database.

[43]:
translator = IDTranslator('HI-union.tsv', 'genes_translated.csv', 'Ensembl', 'UniProtKB-Swiss-Prot', has_header=False, input_columns=['source', 'target'], processes=12)
[44]:
%%time
translator.run()
2024-07-22 12:03:12,217 - IDTranslator_140412855874560 - INFO - Starting ID translation process from Ensembl to UniProtKB-Swiss-Prot
INFO:IDTranslator_140412855874560:Starting ID translation process from Ensembl to UniProtKB-Swiss-Prot
2024-07-22 12:03:12,301 - IDTranslator_140412855874560 - INFO - Loaded 64006 rows with 9094 unique IDs
INFO:IDTranslator_140412855874560:Loaded 64006 rows with 9094 unique IDs
2024-07-22 12:03:12,314 - IDTranslator_140412855874560 - INFO - Loaded progress from checkpoint_batch_91.pkl
INFO:IDTranslator_140412855874560:Loaded progress from checkpoint_batch_91.pkl
2024-07-22 12:03:12,627 - IDTranslator_140412855874560 - INFO - Applying translation to dataframe...
INFO:IDTranslator_140412855874560:Applying translation to dataframe...
2024-07-22 12:05:01,954 - IDTranslator_140412855874560 - INFO - Results saved to genes_translated.csv
INFO:IDTranslator_140412855874560:Results saved to genes_translated.csv
2024-07-22 12:05:02,150 - IDTranslator_140412855874560 - INFO - ID translation process completed in 109.93 seconds
INFO:IDTranslator_140412855874560:ID translation process completed in 109.93 seconds
2024-07-22 12:05:02,170 - IDTranslator_140412855874560 - INFO - Original entry count: 64065
INFO:IDTranslator_140412855874560:Original entry count: 64065
2024-07-22 12:05:02,173 - IDTranslator_140412855874560 - INFO - Translated entry count: 64065
INFO:IDTranslator_140412855874560:Translated entry count: 64065
2024-07-22 12:05:02,178 - IDTranslator_140412855874560 - INFO - Expansion factor: 1.00
INFO:IDTranslator_140412855874560:Expansion factor: 1.00
2024-07-22 12:05:02,182 - IDTranslator_140412855874560 - INFO - Translation success rate: 100.00%
INFO:IDTranslator_140412855874560:Translation success rate: 100.00%
CPU times: user 1min 49s, sys: 727 ms, total: 1min 49s
Wall time: 1min 49s
[45]:
translator.remove_untranslated_entries()
2024-07-22 12:05:02,439 - IDTranslator_140412855874560 - INFO - Results saved to genes_translated_cleaned.csv
INFO:IDTranslator_140412855874560:Results saved to genes_translated_cleaned.csv
2024-07-22 12:05:02,442 - IDTranslator_140412855874560 - INFO - Removed 0 untranslated entries.
INFO:IDTranslator_140412855874560:Removed 0 untranslated entries.
2024-07-22 12:05:02,446 - IDTranslator_140412855874560 - INFO - Cleaned database saved to genes_translated_cleaned.csv
INFO:IDTranslator_140412855874560:Cleaned database saved to genes_translated_cleaned.csv
2024-07-22 12:05:02,450 - IDTranslator_140412855874560 - INFO - Original entry count: 64065
INFO:IDTranslator_140412855874560:Original entry count: 64065
2024-07-22 12:05:02,453 - IDTranslator_140412855874560 - INFO - Cleaned entry count: 64065
INFO:IDTranslator_140412855874560:Cleaned entry count: 64065
2024-07-22 12:05:02,472 - IDTranslator_140412855874560 - INFO - Original entry count: 64065
INFO:IDTranslator_140412855874560:Original entry count: 64065
2024-07-22 12:05:02,475 - IDTranslator_140412855874560 - INFO - Translated entry count: 64065
INFO:IDTranslator_140412855874560:Translated entry count: 64065
2024-07-22 12:05:02,479 - IDTranslator_140412855874560 - INFO - Expansion factor: 1.00
INFO:IDTranslator_140412855874560:Expansion factor: 1.00
2024-07-22 12:05:02,483 - IDTranslator_140412855874560 - INFO - Translation success rate: 100.00%
INFO:IDTranslator_140412855874560:Translation success rate: 100.00%
[14]:
huri = pd.read_csv("genes_translated_cleaned.csv", usecols=["source_UniProtKB-Swiss-Prot", "target_UniProtKB-Swiss-Prot"])
[15]:
huri.head()
[15]:
source_UniProtKB-Swiss-Prot target_UniProtKB-Swiss-Prot
0 Q9H2S6 Q9NPE6
1 Q9H2S6 Q9BXK5
2 Q9H2S6 O60238
3 Q9H2S6 P20138
4 Q9H2S6 Q9UM44
[27]:
resources = Universe()
[28]:
mapping = {"source_UniProtKB-Swiss-Prot": "source", "target_UniProtKB-Swiss-Prot": "target"}
resources.add_resources(huri, columns=mapping,  reset_index=True)
[29]:
resources.build()
[30]:
resources.interactions
[30]:
source target is_directed is_stimulation is_inhibition form_complex
0 Q9H2S6 Q9NPE6 False False False False
1 Q9H2S6 Q9BXK5 False False False False
2 Q9H2S6 O60238 False False False False
3 Q9H2S6 P20138 False False False False
4 Q9H2S6 Q9UM44 False False False False
... ... ... ... ... ... ...
64060 B2RXH8 B2RXH8 False False False False
64061 Q8NHW4 Q6IN84 False False False False
64062 ENSG00000276076 ENSG00000276076 False False False False
64063 Q9UI36 Q9UI36 False False False False
64064 ENSG00000280987 ENSG00000280987 False False False False

64065 rows × 6 columns

[31]:
genes = ["CD33", "TNMD", "AMIGO1"]
[32]:
new_net1 = Network(genes, resources=resources.interactions)
[33]:
new_net1.connect_network_radially(max_len=1, only_signed=False, consensus=False)
[34]:
visualizer = NetworkVisualizer(new_net1, color_by='effect')
visualizer.render()
../_images/notebooks_2_add_resources_44_0.svg
[ ]: