Build network using user-defined resources
This notebook explores how to add other individual resources from Omnipath or other public databases.
[35]:
%%time
from neko.core.network import Network
from neko._visual.visualize_network import NetworkVisualizer
from neko.inputs import Universe, signor
import omnipath as op
import pandas as pd
CPU times: user 14 µs, sys: 1 µs, total: 15 µs
Wall time: 16.9 µs
1. Adding a resource already in OmniPath
1A. Specify the interaction resource of interest
[36]:
collectri = op.interactions.CollecTRI.get()
1B. Add new resource to the Resources object
[37]:
resources = Universe()
resources.add_resources(collectri, name = 'collectri')
resources.build()
[38]:
resources.interactions
[38]:
source | target | is_directed | is_stimulation | is_inhibition | consensus_direction | consensus_stimulation | consensus_inhibition | curation_effort | references | sources | n_sources | n_primary_sources | n_references | references_stripped | form_complex | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | P01106 | O14746 | False | True | False | True | True | False | 82 | CollecTRI:10022128;CollecTRI:10491298;CollecTR... | CollecTRI;DoRothEA-A_CollecTRI;ExTRI_CollecTRI... | 8 | 1 | 74 | 10022128;10491298;10606235;10637317;10723141;1... | False |
1 | P17947 | P02818 | False | True | False | True | True | False | 3 | CollecTRI:10022617 | CollecTRI;ExTRI_CollecTRI | 2 | 1 | 1 | 10022617 | False |
2 | COMPLEX:P15407_P17275 | P05412 | False | True | False | True | True | False | 53 | CollecTRI:10022869;CollecTRI:10037172;CollecTR... | CollecTRI;ExTRI_CollecTRI;NTNU.Curated_CollecT... | 4 | 1 | 49 | 10022869;10037172;10208431;10366004;11281649;1... | False |
3 | COMPLEX:P01100_P05412 | P05412 | False | True | False | True | True | False | 53 | CollecTRI:10022869;CollecTRI:10037172;CollecTR... | CollecTRI;ExTRI_CollecTRI;NTNU.Curated_CollecT... | 4 | 1 | 49 | 10022869;10037172;10208431;10366004;11281649;1... | False |
4 | COMPLEX:P01100_P17275 | P05412 | False | True | False | True | True | False | 53 | CollecTRI:10022869;CollecTRI:10037172;CollecTR... | CollecTRI;ExTRI_CollecTRI;NTNU.Curated_CollecT... | 4 | 1 | 49 | 10022869;10037172;10208431;10366004;11281649;1... | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
64490 | Q01196 | Q13094 | False | True | False | True | True | False | 3 | CollecTRI:20019798 | CollecTRI;DoRothEA-A_CollecTRI | 2 | 1 | 1 | 20019798 | False |
64491 | Q01196 | Q6MZQ0 | False | True | False | True | True | False | 3 | CollecTRI:20019798 | CollecTRI;DoRothEA-A_CollecTRI | 2 | 1 | 1 | 20019798 | False |
64492 | Q15672 | P08151 | False | True | False | True | True | False | 3 | CollecTRI:11948912 | CollecTRI;DoRothEA-A_CollecTRI | 2 | 1 | 1 | 11948912 | False |
64493 | P22415 | Q5SRE5 | False | True | False | True | True | False | 3 | CollecTRI:22951020 | CollecTRI;DoRothEA-A_CollecTRI | 2 | 1 | 1 | 22951020 | False |
64494 | Q9UQR1 | Q5VYX0 | False | True | False | True | True | False | 3 | CollecTRI:25295465 | CollecTRI;DoRothEA-A_CollecTRI | 2 | 1 | 1 | 25295465 | False |
64495 rows × 16 columns
It is important once added a new resource, to build the database as showcases in the previous cell. All those steps assure that the incoming database is compatible with the NeKo structure, possibly avoiding mistakes when running the connecting algorithms.
Sometimes a WARNING can suggest that some interactions could be missing; while this is limiting the amount of knowledge NeKo can extract, it does not prevent the package from working. Other possible cause of a WARNING is the absence of “consensus” columns. To avoid errors while running the package, set always consensus to False.
2. Adding a public database
Alternatively, the user might want to use their own interaction databases. We have already implemented the inclusion of some of widely used databases.
As example, we show here how to integrate the Signor 3.0 database. In order to do so, the User needs to have already downloaded the whole Signor database, available at the following link: https://signor.uniroma2.it/downloads.php or with the following code:.
2A. Add Signor database
[39]:
resources = Universe()
resources = signor("../neko/_data/signor_db.tsv") # this function accept only tab separated values
resources.build()
[40]:
resources.interactions
[40]:
source | target | is_directed | is_stimulation | is_inhibition | form_complex | consensus_direction | consensus_stimulation | consensus_inhibition | curation_effort | references | sources | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | A0A024RAD5 | SIGNOR-C535 | True | False | False | True | False | False | False | miannu | 31831667 | SIGNOR-272062 |
1 | A0A0B4J2F0 | P18848 | True | False | True | False | False | False | False | miannu | 31653868 | SIGNOR-261041 |
2 | A0A0B4J2F0 | P35638 | True | False | True | False | False | False | False | miannu | 31653868 | SIGNOR-261043 |
3 | A0A0B4J2F0 | SIGNOR-PH2 | True | False | True | False | False | False | False | miannu | 31653868 | SIGNOR-261042 |
4 | A0AVT1 | SIGNOR-C496 | True | False | False | True | False | False | False | miannu | 24816100 | SIGNOR-270835 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
28850 | URS000075C808_9606 | Q6ZN04 | True | True | False | False | False | False | False | miannu | 24326307 | SIGNOR-272092 |
28851 | URS000075C808_9606 | Q86Y13 | True | True | False | False | False | False | False | miannu | 24326307 | SIGNOR-272091 |
28852 | URS000075CF56_9606 | P05019 | True | False | True | False | False | False | False | miannu | 25477897 | SIGNOR-255793 |
28853 | URS000075CF56_9606 | P23759 | True | False | True | False | False | False | False | irozzo | 24708856 | SIGNOR-256124 |
28854 | URS000075D8A0_9606 | P61073 | True | True | False | False | False | False | False | Luisa | 20516212 | SIGNOR-268952 |
28855 rows × 12 columns
In order to avoid conflict or errors, ensure that the file contains at least the following columns: IDA, IDB, EFFECT, ANNOTATOR, PMID, SIGNOR_ID
Note
SIGNOR uses has different identifiers for complexes, protein family, phenotype, etc. It is possible that the network contains nodes names like “Signor_pf32” or something like this… This is not yet translated, but if you are interested in what those nodes consist of, you can use download their vocabulary for the entities: https://signor.uniroma2.it/downloads.php
3. Build the network
3A. Import genes as network nodes
[41]:
genes = ["SRC", "NOTCH1", "PTK2", "CDH1", "CDH2", "VIM", "MAP4K4", "LATS1", "LATS2"]
3B. Create network object by specifying the interaction resources
[42]:
new_net1 = Network(genes, resources = resources.interactions)
[43]:
#Print node dataframe
new_net1.nodes
[43]:
Genesymbol | Uniprot | Type | |
---|---|---|---|
0 | SRC | P12931 | NaN |
1 | NOTCH1 | P46531 | NaN |
2 | PTK2 | Q05397 | NaN |
3 | CDH1 | P12830 | NaN |
4 | CDH2 | P19022 | NaN |
5 | VIM | P08670 | NaN |
6 | MAP4K4 | O95819 | NaN |
7 | LATS1 | O95835 | NaN |
8 | LATS2 | Q9NRM7 | NaN |
3C. Build network
The downstream steps to connect your nodes, are the same. Please see the Network building tutorial for detailed explanations of each step.
[44]:
%%time
new_net1.connect_nodes(only_signed=True, consensus_only=False)
CPU times: user 24.8 ms, sys: 945 µs, total: 25.8 ms
Wall time: 25.2 ms
[45]:
visualizer = NetworkVisualizer(new_net1, color_by='effect', noi=True)
visualizer.render()
Note
NB! It is important to note, that not all the databases have the same structure as Omnipath. In particular, if the columns “consensus” are missing, we suggest to avoid to use the flag consensus
or consensus_only
when using NeKo with customize databases. As a general advise, we suggest to always use consensus=False
when you are using a database different from the Omnipath.
[46]:
%%time
new_net1.complete_connection(maxlen=3, algorithm="bfs", only_signed=True, connect_with_bias=False, consensus=False)
CPU times: user 2.59 s, sys: 112 ms, total: 2.7 s
Wall time: 2.7 s
[47]:
#Visualize network
visualizer1 = NetworkVisualizer(new_net1, color_by='effect', noi=True)
visualizer1.render()
4. Translate IDs
NeKo’s resource object relies on UniProt IDs. In cases where the user-defined database does not have UniProt IDs, NeKo offers the function to translate between different IDs. The translation module is based on the python package “Unipressed” (https://github.com/multimeric/Unipressed).
In the example below, we use the HURI database (http://www.interactome-atlas.org/) which provides protein-protein interactions (ppi) with ENSEMBL IDs.
To run the following example, please download the database at the following link: http://www.interactome-atlas.org/download. The example uses the HI-Union
version of the database in .tsv
format.
[13]:
#Import the module
from neko.inputs.db_translator import IDTranslator
The IDTranslator
function receives as arguments the input and output file names, together with the existing IDs and the target ones (in that case Ensembl to UniProtKB IDs).
Note
NB! The IDTranslator
can take several minutes the first time you translate a database, depending on the size of the database.
[43]:
translator = IDTranslator('HI-union.tsv', 'genes_translated.csv', 'Ensembl', 'UniProtKB-Swiss-Prot', has_header=False, input_columns=['source', 'target'], processes=12)
[44]:
%%time
translator.run()
2024-07-22 12:03:12,217 - IDTranslator_140412855874560 - INFO - Starting ID translation process from Ensembl to UniProtKB-Swiss-Prot
INFO:IDTranslator_140412855874560:Starting ID translation process from Ensembl to UniProtKB-Swiss-Prot
2024-07-22 12:03:12,301 - IDTranslator_140412855874560 - INFO - Loaded 64006 rows with 9094 unique IDs
INFO:IDTranslator_140412855874560:Loaded 64006 rows with 9094 unique IDs
2024-07-22 12:03:12,314 - IDTranslator_140412855874560 - INFO - Loaded progress from checkpoint_batch_91.pkl
INFO:IDTranslator_140412855874560:Loaded progress from checkpoint_batch_91.pkl
2024-07-22 12:03:12,627 - IDTranslator_140412855874560 - INFO - Applying translation to dataframe...
INFO:IDTranslator_140412855874560:Applying translation to dataframe...
2024-07-22 12:05:01,954 - IDTranslator_140412855874560 - INFO - Results saved to genes_translated.csv
INFO:IDTranslator_140412855874560:Results saved to genes_translated.csv
2024-07-22 12:05:02,150 - IDTranslator_140412855874560 - INFO - ID translation process completed in 109.93 seconds
INFO:IDTranslator_140412855874560:ID translation process completed in 109.93 seconds
2024-07-22 12:05:02,170 - IDTranslator_140412855874560 - INFO - Original entry count: 64065
INFO:IDTranslator_140412855874560:Original entry count: 64065
2024-07-22 12:05:02,173 - IDTranslator_140412855874560 - INFO - Translated entry count: 64065
INFO:IDTranslator_140412855874560:Translated entry count: 64065
2024-07-22 12:05:02,178 - IDTranslator_140412855874560 - INFO - Expansion factor: 1.00
INFO:IDTranslator_140412855874560:Expansion factor: 1.00
2024-07-22 12:05:02,182 - IDTranslator_140412855874560 - INFO - Translation success rate: 100.00%
INFO:IDTranslator_140412855874560:Translation success rate: 100.00%
CPU times: user 1min 49s, sys: 727 ms, total: 1min 49s
Wall time: 1min 49s
[45]:
translator.remove_untranslated_entries()
2024-07-22 12:05:02,439 - IDTranslator_140412855874560 - INFO - Results saved to genes_translated_cleaned.csv
INFO:IDTranslator_140412855874560:Results saved to genes_translated_cleaned.csv
2024-07-22 12:05:02,442 - IDTranslator_140412855874560 - INFO - Removed 0 untranslated entries.
INFO:IDTranslator_140412855874560:Removed 0 untranslated entries.
2024-07-22 12:05:02,446 - IDTranslator_140412855874560 - INFO - Cleaned database saved to genes_translated_cleaned.csv
INFO:IDTranslator_140412855874560:Cleaned database saved to genes_translated_cleaned.csv
2024-07-22 12:05:02,450 - IDTranslator_140412855874560 - INFO - Original entry count: 64065
INFO:IDTranslator_140412855874560:Original entry count: 64065
2024-07-22 12:05:02,453 - IDTranslator_140412855874560 - INFO - Cleaned entry count: 64065
INFO:IDTranslator_140412855874560:Cleaned entry count: 64065
2024-07-22 12:05:02,472 - IDTranslator_140412855874560 - INFO - Original entry count: 64065
INFO:IDTranslator_140412855874560:Original entry count: 64065
2024-07-22 12:05:02,475 - IDTranslator_140412855874560 - INFO - Translated entry count: 64065
INFO:IDTranslator_140412855874560:Translated entry count: 64065
2024-07-22 12:05:02,479 - IDTranslator_140412855874560 - INFO - Expansion factor: 1.00
INFO:IDTranslator_140412855874560:Expansion factor: 1.00
2024-07-22 12:05:02,483 - IDTranslator_140412855874560 - INFO - Translation success rate: 100.00%
INFO:IDTranslator_140412855874560:Translation success rate: 100.00%
[14]:
huri = pd.read_csv("genes_translated_cleaned.csv", usecols=["source_UniProtKB-Swiss-Prot", "target_UniProtKB-Swiss-Prot"])
[15]:
huri.head()
[15]:
source_UniProtKB-Swiss-Prot | target_UniProtKB-Swiss-Prot | |
---|---|---|
0 | Q9H2S6 | Q9NPE6 |
1 | Q9H2S6 | Q9BXK5 |
2 | Q9H2S6 | O60238 |
3 | Q9H2S6 | P20138 |
4 | Q9H2S6 | Q9UM44 |
[27]:
resources = Universe()
[28]:
mapping = {"source_UniProtKB-Swiss-Prot": "source", "target_UniProtKB-Swiss-Prot": "target"}
resources.add_resources(huri, columns=mapping, reset_index=True)
[29]:
resources.build()
[30]:
resources.interactions
[30]:
source | target | is_directed | is_stimulation | is_inhibition | form_complex | |
---|---|---|---|---|---|---|
0 | Q9H2S6 | Q9NPE6 | False | False | False | False |
1 | Q9H2S6 | Q9BXK5 | False | False | False | False |
2 | Q9H2S6 | O60238 | False | False | False | False |
3 | Q9H2S6 | P20138 | False | False | False | False |
4 | Q9H2S6 | Q9UM44 | False | False | False | False |
... | ... | ... | ... | ... | ... | ... |
64060 | B2RXH8 | B2RXH8 | False | False | False | False |
64061 | Q8NHW4 | Q6IN84 | False | False | False | False |
64062 | ENSG00000276076 | ENSG00000276076 | False | False | False | False |
64063 | Q9UI36 | Q9UI36 | False | False | False | False |
64064 | ENSG00000280987 | ENSG00000280987 | False | False | False | False |
64065 rows × 6 columns
[31]:
genes = ["CD33", "TNMD", "AMIGO1"]
[32]:
new_net1 = Network(genes, resources=resources.interactions)
[33]:
new_net1.connect_network_radially(max_len=1, only_signed=False, consensus=False)
[34]:
visualizer = NetworkVisualizer(new_net1, color_by='effect')
visualizer.render()
[ ]: