Launch SocMap as follows:
$ ./socmap.py --authfile auth.txt --userlist userlist.txt --layers 2
Where auth.txt
has your Twitter login credentials in it (see the Twitter authentication section below), userlist.txt
contains a list of seed usernames (one per line), and --layers 2
says to expand the search two layers out.
After SocMap has run, it will produce output files in the map
folder, with names like layer2.gml
. These text output files contain the graph information, and are suitable for visualization in Gephi or Cytoscape, among other network analysis tools.
usage: socmap.py [-h] [-c] [-l LAYERS] [-n NUMTWEETS] [-M MAXREFERENCES] [--ignorementions | --ignoreretweets] [-w WORKDIR] [-t TWEETDIR] [-m MAPDIR] -a <file> -u <file> [-L <file>] [-d] A Framework for Social-Network Mapping optional arguments: -h, --help show this help message and exit -c, --compress Compress downloaded tweets with GZIP -l LAYERS, --layers LAYERS How many layers out to download -n NUMTWEETS, --numtweets NUMTWEETS How many tweets to download from each user -M MAXREFERENCES, --maxreferences MAXREFERENCES Maximum number of retweeted and mentioned users to track per user --ignorementions Do not follow mentions during mapping --ignoreretweets Do not follow retweets during mapping -w WORKDIR, --workdir WORKDIR Where to store temporary files -t TWEETDIR, --tweetdir TWEETDIR Where to store downloaded tweets -m MAPDIR, --mapdir MAPDIR Where to store map data -a <file>, --authfile <file> File containing consumer keys and access tokens -u <file>, --userlist <file> File containing list of starting usernames -L <file>, --logfile <file> Where to store log data relative to workdir (detault stdout) -d, --debug Enable debug-level logging
Apps cannot connect to Twitter using a username and password - they must connect using an authentication token.
Follow Twitter’s full guide here, or follow our abbreviated steps:
auth.txt
in the example above), one on each line, in the above orderSocMap keeps data in three directories. By default, these are:
All directories can be changed with command line options:
-w WORKDIR, --workdir WORKDIR
Where to store temporary files
-t TWEETDIR, --tweetdir TWEETDIR
Where to store downloaded tweets
-m MAPDIR, --mapdir MAPDIR
Where to store map data
Directories will be created automatically if they do not already exist.
Tweets collected from each user are stored as JSON, and may optionally be compressed with GZIP using -c
or --compress
.
Maps of Twitter communities are stored as GML files, which is a format that can be read by Gephi, Cytoscape, and NetworkX.
We store the following information about a user in their node on the graph:
We store the following information about a connection in each edge on the graph:
Twitter places significant limitations on how much information we can access. In general, SocMap can only see about the last 2000 tweets from any user. This means we will only see recent mentions and retweets between accounts, and can only say how many times a user mentioned or retweeted another within our limited data set.
Twitter also enforces strict rate limits on API usage. When SocMap is rate limited it will block until the rate limit period is over, then resume collection. This means for a moderate dataset (~10,000 users) it is not unusual for SocMap to take several days to download data.
You can reduce the download time by ignoring either mentions or retweets during data collection, or by placing a limit on how many references per user to track:
-M MAXREFERENCES, --maxreferences MAXREFERENCES Maximum number of retweeted and mentioned users to track per user --ignorementions Do not follow mentions during mapping --ignoreretweets Do not follow retweets during mapping
For example, -M 100 --ignorementions
will collect only 100 retweeted accounts per user, and will ignore mentions entirely.
By default, SocMap logs to stdout. You can change this behavior by specifying the path for a logfile with -L
or --logfile
. The logfile is created relative to the workdir. For example:
$ ./socmap.py -a auth.txt -u userlist.txt -L log.txt
Will create a logfile in ./work/log.txt
and put log messages there instead of stdout.
You can increase the level of logging with -d
or --debug
to enable debug-level log messages, which are suppressed by default.
SocMap comes with a number of ancillary tools for analyzing downloaded Twitter data. These include:
Tool | Description |
---|---|
splitRetweetsAndMentions.py | Splits the graph into graphs of retweet relationships and mention relationships, to be analyzed separately |
sortNodeDegrees.py | Displays a list of users in a network sorted by degree |
removeLowDegreeNodes.py | Prunes nodes with a degree below a specified threshold, to shrink network maps until visualization tools can process them |
getInsularity.py | For a list of users, returns what percentage of retweets by users on the list are retweets of other users on the list |
mergeMaps.py | Combine two map files, creating a union of users and social relationships |
pruneUsers.py | Removes a list of users from a map, and leaves only users reachable from seeds |
pruneTweets.py | Removes users with a low level of activity from a map |
pruneRetweets.py | Removes links between users unless there are a sufficient number of retweets |
pruneMentions.py | Same as pruneRetweets, but for mentions |
searchTweets.py | Search through downloaded tweets for a regular expression |