Analyzing Social Media Data in Python
Welcome to analyzing social media data with python. In this tutorial series we’re going to analyze Twitter data using Python.
There are millions of tweets created every day from across the entire world, in many different languages. In this course we’re going to learn how to collect Twitter data, how to process Twitter text, how to analyze Twitter networks, and how to map Twitter data geographically.
Let’s get start.
You can’t access all of what happens on Twitter. It may seem obvious to say, but you can only collect information on what people say, who is watching passively. Twitter collects data on this internally but doesn’t release it for analysis.
- Can’t collect data on observers
- Can’t collect historical data
- Only a %1 (unverified) sample
Analyzing Social Media Data in Python
Collecting Data through the Twitter API
Using tweepy to collect data:
“tweepy” abstracts away much of the work we need to set up a stable Twitter Streaming API connection
When you do this in practice, you’re going to have to set up your own Twitter account and API keys for authentication.
Let’s make an example
from tweepy import OAuthHandler from tweepy import API # Consumer key authentication auth = OAuthHandler("GY2FXOVij492hsughn3tV2Lik", "GyLEAXkWACz8glGl2P6YX9bpjbsBI1Qqt0rBUcSKNTdoaFOoZR") # Access key authentication auth.set_access_token("1043409037012992000-5gqhMHxZHzobo4TOkeAUt35mfwX8DV", "Zzp0XFwfhUczTkdqdyKC4vwkv8VllGBqBvVJbR31bLoea") # Set up the API with the authentication handler api = API(auth) public_tweets = api.home_timeline() for tweet in public_tweets: print(tweet.text)
See you in the next article