About - Datasets -
SocMap
- Docs - Download

Fork me on GitHub

Quickstart

Launch SocMap as follows:

$ ./socmap.py --authfile auth.txt --userlist userlist.txt --layers 3

Where auth.txt has your Twitter login credentials in it (see the Twitter authentication section below), userlist.txt contains a list of seed usernames (one per line), and --layers 3 says to expand the search three layers out.

Usage

usage: socmap.py [-h] [-c] [-l LAYERS] [-n NUMTWEETS] [-w WORKDIR]
                 [-t TWEETDIR] [-m MAPDIR] -a <file> -u <file> [-L <file>]
                 [-d]

A Framework for Social-Network Mapping

optional arguments:
  -h, --help            show this help message and exit
  -c, --compress        Compress downloaded tweets with GZIP
  -l LAYERS, --layers LAYERS
                        How many layers out to download
  -n NUMTWEETS, --numtweets NUMTWEETS
                        How many tweets to download from each user
  -w WORKDIR, --workdir WORKDIR
                        Where to store temporary files
  -t TWEETDIR, --tweetdir TWEETDIR
                        Where to store downloaded tweets
  -m MAPDIR, --mapdir MAPDIR
                        Where to store map data
  -a <file>, --authfile <file>
                        File containing consumer keys and access tokens
  -u <file>, --userlist <file>
                        File containing list of starting usernames
  -L <file>, --logfile <file>
                        Where to store log data relative to workdir (detault
                        stdout)
  -d, --debug           Enable debug-level logging

Twitter Authentication

Apps cannot connect to Twitter using a username and password - they must connect using an authentication token.

Follow Twitter’s full guide here, or follow our abbreviated steps:

  1. Login to your Twitter account at apps.twitter.com
  2. Create a new app
  3. Select the new app and navigate to the “Keys and Access Tokens” panel
  4. Copy the “consumer key”, “consumer secret”, “access token”, and “access token secret”
  5. Put them in a text file (we used auth.txt in the example above), one on each line, in the requested order

Directories

SocMap keeps data in three directories. By default, these are:

  • tweets - stores all collected tweets
  • map - stores completed maps for each layer of data collection
  • work - stores temporary data for tracking information between layers of data collection

All directories can be changed with command line options:

  -w WORKDIR, --workdir WORKDIR
                        Where to store temporary files
  -t TWEETDIR, --tweetdir TWEETDIR
                        Where to store downloaded tweets
  -m MAPDIR, --mapdir MAPDIR
                        Where to store map data

Directories will be created automatically if they do not already exist.

How is Data Stored?

Tweets collected from each user are stored as JSON, and may optionally be compressed with GZIP using -c or --compress.

Maps of Twitter communities are stored as GML files, which is a format that can be read by Gephi, Cytoscape, and NetworkX.

We store the following information about a user in their node on the graph:

  • name - Their Twitter username
  • retweeted - Whether the user is included because they were retweeted by another user
  • mentioned - Whether the user is included because they were mentioned by another user
  • layer - How many hops away from the original seed user this user is

We store the following information about a connection in each edge on the graph:

  • retweeted - How many times the source retweeted the destination
  • mentioned - How many times the source mentioned the destination

Limitations

Twitter places significant limitations on how much information we can access. In general, SocMap can only see about the last 2000 tweets from any user. This means we will only see recent mentions and retweets between accounts, and can only say how many times a user mentioned or retweeted another within our limited data set.

Twitter also enforces strict rate limits on API usage. When SocMap is rate limited it will block until the rate limit period is over, then resume collection. This means for a moderate dataset (~10,000 users) it is not unusual for SocMap to take several days to download data.

Logging

By default, SocMap logs to stdout. You can change this behavior by specifying the path for a logfile with -L or --logfile. The logfile is created relative to the workdir. For example:

$ ./socmap.py -a auth.txt -u userlist.txt -L log.txt

Will create a logfile in ./work/log.txt and put log messages there instead of stdout.

You can increase the level of logging with -d or --debug to enable debug-level log messages, which are suppressed by default.