Profile Image

Tools & Techniques for Digital Humanities (2019W)


070112 UE Course in Methodology - Tools & Techniques for Digital Humanities (2019W) — University of Vienna, Department of History; Instructor Dr. Maxim G. Romanov


L13 Social Network Analysis (SNA) — Basics through Gephi

Goals:

  1. Basics of Social Network Analysis.
  2. Basics of Gephi.

Software

Files & Scripts

Class

  • Basic explanations
  • Hands-on tutorial

SNA Concepts

  • Networks
    • Patterns of relationships that connect individuals, institutions, or objects (or leave them unconnected)
    • Examples: family ties—vertical and horizontal; patronage—who pays whom for what services; shared membership in organizations; paterns of contracts among companies.
  • Main elements
    • Node (vertex, point, item)—person, organization, country, document, etc.
    • Edge (tie, link, line)—relationship, friendship, common attendance, co-occurance, kinship, membership
      • Directed and Undirected links: marriage—undirected; patronage—directed.
    • Graphs (Sociograms) / Matrices (pl. of matrix)
  • Graph properties
    • Density: number of edges / number of edges in a complete graph (example of complete and incomplete graph)
    • Centrality measures
      • Degree: Proportional to the number of other nodes to which a node is links – Number of links divided by (n-1). For example: it may measure exposure to what is flowing through the network (opportunity to influence directly; in-degree and out-degree).
      • Closeness – The sum of shortest paths to all other points in the graph. Divide by (n-1), then invert. For example: it may measure how quickly information can travel.
      • Betweenness – The extent to which a particular point lies ‘between’ other points in the graph; how many shortest paths (geodesics) is it on? A measure of brokerage or gatekeeping (How often a node lies on the shortest path between two other nodes)
      • Eigenvector – A weighted measure of centrality that takes into account the centrality of other nodes to which a node is connected. That is, being connect with other central nodes increases centrality. E.g., secretary of powerful person. Google’s page rank algorithm is based on a variation of this approach.
    • Modularity—interconnected clusters.

Gephi Data Format (Basic)

Nodes file

A CSV file with as many columns as you might need, but one must be id. These ids must be used in your edges data file in fields source and target.

Often scholars use numbers as ids in the edges table, while providing all the robust relevant information on the edges in the nodes table. This allows to accommodate as must information on the nodes as one might need (for example, coordinates, if nodes are locations; of affiliations, if nodes are people). This also helps to make edges file significantly smaller, as the information on nodes is not repeated.

Edges file

A CSV file with three columns:

1) source 2) target 3) weight

Lots of tutorials for Gephi can be found here: https://gephi.wordpress.com/.

Let’s try “Star Wars”

Download files with nodes and edges from the top of the page. Let’s try to load them into Gephi.

Collecting SNA data from “Dispatch”

Essentially, you can use the same script that you used to collect toponymic data from articles; here, however, the lines of code that collected toponymic data from a given artice should be replaces with the lines of code that collect network data:

1) collect all relevant items; remove duplicates; 2) convert them into edges (see, function edges() below); 3) after all the edges are collected, you will need to count their frequencies; 4) the result should be a table (or, a csv/tsv file) with three columns: source, target, weight; 5) weight value can be counted in a number of different ways; by and large, it is up to you, as it depends on what specifically you want to measure.

Code snippets

NB: edgesDic = {} must be declared in the very beginning of the script.

# updating dictionary
def updateDic(dic, key):
    if key in dic:
        dic[key] += 1
    else:
        dic[key]  = 1

# creating edges from a list
import itertools
def edges(edgesList, edgesDic):
    edges = list(itertools.combinations(edgesList, 2))
    for e in edges:
        key = "\t".join(sorted(list(e))) # A > B (sorted alphabetically, to avoid cases of B > A)
        updateDic(edgesDic, key)

# collectign tagged toponyms into a network
def collectTaggedToponyms(xmlText, dic, date):
    xmlText = re.sub("\s+", " ", xmlText)

    topList = []

    for t in re.findall(r"<placeName[^<]+</placeName>", xmlText):
        t = t.lower()

        if re.search(r'"(tgn,\d+)', t):
            reg = re.search(r'"(tgn,\d+)', t).group(1)

            if reg in mapData:
                topList.append(reg)

    # generating edges and updating their freqs
    edges(topList, edgesDic)

Reference Materials:

Some interesting examples

Homework:

  1. [Easy, relatively] Create a network of places in the “Dispatch”—if places are mentioned in the same article, they must be connected one way or the other—we can use that as the basis for creating edges. Generate a vizualisation in Gephi, only here you will need to create a geographical network, i.e., a network put over a real map: use Geo Layout plugin (your file with the data on nodes must also include coordinates). Describe your observations on your website as a separate blog post.
  2. [A bit harder: optional] Create a network of people mentioned in the “Dispatch” (if individuals are mentioned in the same article, they must be connected one way or the other—we can use that as the basis for creating edges). Generate a vizualisation in Gephi. Describe your observations on your website as a separate blog post.

Submitting homework:

  • Homework assignment must be submitted by the beginning of the next class;
  • Email your homework to the instructor.
    • if your homework is to create a file, email it as an attachment
    • if your homework is a blogpost on your website, email the link to your website and to the blogpost with your homework.
    • In the subject of your email, please, add the following: 070112-LXX-HW-YourLastName-YourMatriculationNumber, where LXX is the lesson for which the homework is submitted, YourLastName is your last name, and YourMatriculationNumber is your matriculation number.