Launch SocMap as follows:
$ ./socmap.py --authfile auth.txt --userlist userlist.txt --layers 3
auth.txt has your Twitter login credentials in it (see the Twitter authentication section below),
userlist.txt contains a list of seed usernames (one per line), and
--layers 3 says to expand the search three layers out.
usage: socmap.py [-h] [-c] [-l LAYERS] [-n NUMTWEETS] [-w WORKDIR] [-t TWEETDIR] [-m MAPDIR] -a <file> -u <file> [-L <file>] [-d] A Framework for Social-Network Mapping optional arguments: -h, --help show this help message and exit -c, --compress Compress downloaded tweets with GZIP -l LAYERS, --layers LAYERS How many layers out to download -n NUMTWEETS, --numtweets NUMTWEETS How many tweets to download from each user -w WORKDIR, --workdir WORKDIR Where to store temporary files -t TWEETDIR, --tweetdir TWEETDIR Where to store downloaded tweets -m MAPDIR, --mapdir MAPDIR Where to store map data -a <file>, --authfile <file> File containing consumer keys and access tokens -u <file>, --userlist <file> File containing list of starting usernames -L <file>, --logfile <file> Where to store log data relative to workdir (detault stdout) -d, --debug Enable debug-level logging
Apps cannot connect to Twitter using a username and password - they must connect using an authentication token.
Follow Twitter’s full guide here, or follow our abbreviated steps:
auth.txtin the example above), one on each line, in the requested order
SocMap keeps data in three directories. By default, these are:
All directories can be changed with command line options:
-w WORKDIR, --workdir WORKDIR Where to store temporary files -t TWEETDIR, --tweetdir TWEETDIR Where to store downloaded tweets -m MAPDIR, --mapdir MAPDIR Where to store map data
Directories will be created automatically if they do not already exist.
Tweets collected from each user are stored as JSON, and may optionally be compressed with GZIP using
We store the following information about a user in their node on the graph:
We store the following information about a connection in each edge on the graph:
Twitter places significant limitations on how much information we can access. In general, SocMap can only see about the last 2000 tweets from any user. This means we will only see recent mentions and retweets between accounts, and can only say how many times a user mentioned or retweeted another within our limited data set.
Twitter also enforces strict rate limits on API usage. When SocMap is rate limited it will block until the rate limit period is over, then resume collection. This means for a moderate dataset (~10,000 users) it is not unusual for SocMap to take several days to download data.
By default, SocMap logs to stdout. You can change this behavior by specifying the path for a logfile with
--logfile. The logfile is created relative to the workdir. For example:
$ ./socmap.py -a auth.txt -u userlist.txt -L log.txt
Will create a logfile in
./work/log.txt and put log messages there instead of stdout.
You can increase the level of logging with
--debug to enable debug-level log messages, which are suppressed by default.