CoCitation

The goal is to create a co-citation graph for a list of references.

from co_citation import CoCitation

cites = CoCitation(
    [
        "arxiv:1602.05112",
        "pubmed:8113053",
        "sciencedirect:S0167923610001703",
        "scopus:10.1016/j.cmet.2020.11.014",
    ],
    data_type="journal", # or "article", "institution"
    wait=None, # None or the time to wait between requests (in seconds)
    retries=None, # None or the number of retries for HTTPS requests
    first_last_author=False, # Set to True to only get the institution of the first and last authors
)
cites.write_graph_edges("graph")
cites.plot_graph(
    display=False,
    k=10, # The spacing between the nodes
    seed=42, # Use the seed argument for reproducibility
    margin=dict(b=0, l=110, r=150, t=40)
)
class co_citation.CoCitation(articles_list: List[str], sd_api_key: str = '', graph: str = '', node_weights: str = 'eigenvector', wait: Optional[int] = None, retries: Optional[int] = None, data_type: str = 'journal', first_last_author: bool = False)[source]

Create a co-citation graph

create_citation_graph(articles_list: List[str]) → networkx.classes.graph.Graph[source]
  1. Get the references of each article and their corresponding data (journal, article or institution)

  2. Generate the co-citation pairs and add them the graph. The weights are the number of times the data are co-cited.

Parameters

articles_list (list) – The list of articles URL. At the moment only arXiv, ScienceDirect and PubMed are supported

Returns

The graph

Return type

nx.Graph

filter_low_co_citations(criteria: int)None[source]

Remove low weight edges and isolated nodes

Parameters

criteria (int) – The weight minimum in the resulting graph

filter_low_co_citations_nodes(criteria: int)None[source]

Remove low weight nodes

Parameters

criteria (int) – The weight minimum in the resulting graph

static gen_perms(citations: List[str]) → List[List[Union[str, int]]][source]

Get all pair commutative permutations of a list

Parameters

citations (list) – The list of journal citations

Returns

The pairs

Return type

list

get_all_elsevier_refs(api_refs_url, refs: List[str]) → List[str][source]

Get all references for an article indexed in scopus. The references are paginated by 40 so the function calls itself until the next API page.

Parameters
  • api_refs_url (str) – The URL to the Scopus API allowing to get the references of an article

  • refs (List[str]) – The list of references

Returns

The references

Return type

List[str]

get_article_institution_pubmed(pmid: str) → List[str][source]

Get the institutions of an article indexed in semanticscholar

Parameters

pmid (str) – A pubmed PMId

Returns

The article’s institutions

Return type

List[str]

get_article_institution_scopus(ref: bs4.element.Tag) → List[str][source]

Get the institutions of authors from a scopus reference

Parameters

ref (dict) – An article scopus reference

Returns

The institutions

Return type

List[str]

get_article_title_pubmed(pmid: str)str[source]

Get the title of a pubmed article

Parameters

pmid (str) – A pubmed PMId

Returns

The article’s title

Return type

str

get_article_title_scopus(ref)str[source]

Get the title of an article indexed in scopus

Parameters

ref (dict) – An article scopus reference

Returns

The article’s title

Return type

str

static get_article_title_sem_scholar(ref: dict)str[source]

Get the title of an article indexed in semanticscholar

Parameters

ref (dict) – An article reference

Returns

The article’s title

Return type

str

get_citations(article_url: str) → List[str][source]

Get all citations data for an article

This function does two things:

  1. Get the citations

  2. For each citation, get the data (journal or article)

Parameters

article_url (str) – The URL of the article. At the moment only arXiv, ScienceDirect and PubMed are supported

Returns

The list of citations

Return type

list

get_edge_trace() → List[plotly.graph_objs._scatter.Scatter][source]

Generate the edges trace. The colors corresponds to the edge weights

Returns

The list of edges trace

Return type

list

get_journal_pubmed(pmid: str)str[source]

Get the journal of a pubmed article

Parameters

pmid (str) – A pubmed PMId

Returns

The journal’s name

Return type

str

get_journal_scopus(ref: bs4.element.Tag)str[source]

Get the journal of a scopus article :param ref: A scopus reference in a beautifulsoup Tag :type ref: Tag

Returns

The journal’s name

Return type

str

get_journal_sem_scholar(ref: dict)str[source]

Get the journal of an article

Parameters

ref (dict) – A semanticscholar reference

Returns

The journal’s name

Return type

str

get_node_trace()dict[source]

Generate the nodes trace. The colors corresponds to the sum edge weights connected to the noes

Returns

The nodes trace

Return type

dict

get_scopus_affiliation(aff_id: str)str[source]

Get the institutions of authors from a scopus reference

Parameters

aff_id (str) – A scopus affiliation id

Returns

The institution name

Return type

str

static init_nodes_weight(graph: networkx.classes.graph.Graph, criteria: str = 'eigenvector') → networkx.classes.graph.Graph[source]

Initialize the nodes weight. weights

Parameters
  • graph (nx.Graph) – The graph

  • criteria (str) – The criteria for the weights. Must be one of “eigenvector” or “betweenness”

Returns

The graph with the initialized nodes weight

Return type

nx.Graph

static load_abbreviations() → Dict[str, str][source]

Get journal abbreviations

Returns

The abbreviations

Return type

dict

plot_graph(display=True, k=20, seed=42, margin={'b': 0, 'l': 5, 'r': 5, 't': 40})None[source]

Plot the co-citation graph

Parameters
  • display (bool) – If True, view the plot in a web browser, else write the plot to disk

  • k (int) – Minimal distance between nodes

  • seed (int) – Seed of the RNG used for the graph layout

  • margin (dict) – Margins around the graph

static rm_dupes(lst: List[str], threshold: int) → List[str][source]

https://stackoverflow.com/a/14229397 Remove duplicates in a list that have a Levenshtein distance below the threshold

Parameters

lst (List[str]) – A list of strings

Returns

The list of strings with duplicates within the threshold removed

Return type

List[str]

write_graph_edges(filename: str)None[source]

Write the edge list to a file

Parameters

filename (str) – The path to the file