Yo, what it is! You know what it is, it’s your man Kingmusa— and welcome to The Study Guide! I'm here to break down today's class notes and help us learn together. Today we are going over Social Media Analytics & AI Let's dive into our modules on Social Media Analytics & Big Data. We're moving beyond just using social media to understanding the power of big data, how it's analyzed, and where the data comes from.
Key Concept of the Day:
Today, we're focusing on the fundamentals of big data and its relevance to social media analytics. This includes understanding how the digital transformation has led to a massive increase in data and how this data is impacting various fields. We'll also touch on the interdisciplinary nature of big data as a field of study, and the different sources of social media data and the methods used to collect it.
This week’s module focuses on Data sources, collection methods, cleaning, and organization. Identify a problem, collect data, clean, prepare, and analyze. Social media data is unstructured and requires cleaning for analysis. Information collection points include social media platforms and external sources. Technological, legal, and ethical limitations exist. There’s a manual for small projects and an automated approach for larger ones. Project goals, size, and resources determine the best method. Built-in dashboards provide insights into engagement, demographics, and content performance. Platforms like Facebook, Instagram, and YouTube offer analytics tools for reach, engagement, and audience behavior. Social media platforms provide APIs for data access.
Students will use APIs to collect data from platforms like YouTube, TikTok, and LinkedIn for their final project. APIs process requests and provide responses. Web scraping extracts data from websites when API access is limited. Data crawling systematically browses and indexes web pages. Social media researchers and analysts use various sources for data collection. SaaS platforms offer cloud-based access to real-time social media data, automated reporting, AI-driven insights, and cross-platform integration. Public archives and historical data sources store historical social media data for trend analysis. User-generated content is valuable for symptom analysis, engagement monitoring, and trend forecasting.
Social media monitoring tools track and analyze social media activity. An API, like a messenger, connects devices and apps, enabling communication and information sharing. For instance, it facilitates online travel services by communicating with airline systems and displaying booking options. APIs enable data sharing between systems, crucial for businesses making informed decisions. Social media platforms collect user data for personalized ads and content. However, data collection can be challenging due to platform restrictions, time periods, and data privacy.
This concept is important because: Understanding big data and how to collect and analyze it is crucial in today's digital world. It's transforming how businesses operate, how research is conducted, and how we understand online behaviors and trends on social media.
Here are the main points:
- The Significance of Big Data: Big data has become crucial due to the digital transformation and the widespread use of the internet, significantly impacting the global economy. The surge of social media has dramatically increased the generation and spread of data. Big data is an interdisciplinary field, integrating mathematics, statistics, informatics, communication, and business. Social media analyticsuses this vast data to study online behavior and trends.
- Sources and Collection Methods: Social media data originates from platforms like Facebook, X, YouTube, and external sources accessed through APIs (Application Programming Interfaces). Data collection methods include manual collection, using built-in analytics pages, APIs, web scraping (when API access is limited), and third-party tools (like SaaS platforms and social media monitoring tools).
- The Power of APIs: APIs are essential for accessing structured data from social media platforms, acting as messengers that connect devices and apps for communication and information sharing. They enable data sharing between systems, which is vital for informed business decisions. Social media platforms utilize APIs to provide data access while also collecting user data for personalization. However, data collection can face challenges due to platform restrictions, time limitations, and data privacy concerns.
- Ethical Considerations and Platform Policies: When collecting and using social media data, it's crucial to adhere to ethical guidelines and be aware of platform policies. For the final project, understanding the platform's terms and privacy policies is essential. Avoid excessive data collection to prevent privacy issues and be mindful of potential biases in the data. Resources like API tutorials, data scraping tools, and online communities can provide learning and support.
- The "Communalistic" Project and Data Cleaning: The final project proposal, "Communalistic," due February 16th, is a user-friendly social media data collector designed for researchers, journalists, and students, enabling public interest research without coding. It helps define communities, collect social media data, and analyze audience demographics and engagement. Communalistic is free for educational use (with a 30,000 record limit), with a Pro upgrade for larger datasets. The project focuses on collecting data from YouTube due to its accessibility. Students will create a YouTube API key via the Community website (Module 4 on D2L) to collect YouTube comments, aiming for at least 5,000 comments on a chosen topic for the proposal. A crucial step is data cleaning, which involves removing irrelevant information (like IDs, dates, authors, links) to focus on the comment text for analysis. Students will clean 500 comments from their larger dataset, ensuring they are complete, meaningful, and include relevant elements like emojis and hyperlinks, submitting both the original and cleaned sets.
- Spreadsheets for Data Analysis: To simplify data analysis, spreadsheets (like Google Sheets and MS Excel) are valuable digital notebooks for storing, sorting, and analyzing data in rows and columns. They excel at math, data organization, and real-time data changes. Google Sheets offers analysis, visualization (with tools like Google Data Studio), and collaboration features. Data in spreadsheets is typically divided into dimensions (attributes) and metrics (measurements), allowing for easy filtering, pivoting, and aggregation. It also provides data visualization tools like time series and pie charts.
- Twitter Analytics and Data Manipulation: Twitter analytics offers insights into audience engagement and post performance. Tweets contain data points like author, timestamp, text, media, mentions, and hashtags. Users can personalize timelines and categorize tweets with hashtags. Interactions include retweeting (promoting content), favoriting, and replying. Reply networks often have central communicators. Excel’s "Remove Duplicates" feature can be used to analyze tweets by wrapping text, freezing the top row, sorting by engagement, finding top tweets, and removing duplicates.
- Software and Ethical Considerations in Research: Social media research methods include direct platform access and third-party services. Finding and evaluating data collection software requires balancing research needs with platform terms and data requirements, ensuring data sharing with researchers is compliant. While web scraping exists, APIs are preferred due to data access control via terms of service. Software and web applications simplify API data extraction, even for non-programmers. Researchers must manage collected data responsibly, adhering to platform policies and considering ethical implications, as they don't own the data. Terms of Service and developer agreements outline rules for data collection, storage, sharing, and reporting. Secure data storage (encrypted drives, password protection, controlled cloud access) is crucial to prevent unauthorized access, with plans for long-term storage or data expiration after the project. Reporting should focus on dataset characteristics, excluding individuals of public interest, to maintain security and confidentiality. Social media datasets include user-generated data and metadata (identifiers, timestamps, engagement, followers). Data can be collected using APIs, Python, R, and SaaS tools. The module introduces installing and using R and RStudio for data analysis, specifically the RedditExtractoR package for extracting data (posts and comments) from Reddit. RStudio's interface includes panes for code editing (Source), running code (Console), managing data and packages (Environment), and installing/managing packages (Packages). The install.packages(“RedditExtractoR”) command installs the package. Functions like get_thread_content extract comments, and data can be saved as CSV files. You can also collect posts and comments from specific subreddits based on keywords.
In a nutshell, these modules introduce you to the fundamental concepts of big data and its growing importance in the realm of social media analytics, highlighting its interdisciplinary nature, transformative impact, and the methods for collecting and utilizing social media data effectively and ethically. This chapter delves into social media research methods, including direct platform access and third-party services. It covers the process of finding and evaluating software for data collection, balancing research requirements with platform terms and data needs, and ensuring data sharing with researchers. While web scraping can be effective, APIs are preferred due to data access control through terms of service. Software packages and web applications simplify data extraction from APIs, even for non-programmers. Researchers must manage collected data, adhere to platform policies, and consider ethical implications. Social media data isn’t owned by researchers, so they must handle datasets responsibly. Terms of Service and developer agreements outline data collection, storage, sharing, and reporting rules. Data storage is crucial to prevent unauthorized access, so researchers should use encrypted hard drives, password-protected computers, and control cloud storage access. After the project, plan for long-term storage or set an expiration date for the data. When reporting data, focus on dataset characteristics, excluding individuals of public interest. Maintaining dataset security and confidentiality is paramount.
Social media datasets include user-generated data and metadata such as identifiers, timestamps, engagement metrics, and follower counts. Researchers can collect data using APIs, Python, R, and SaaS tools. To begin, download and install R and RStudio from the official website (https://posit.co/download/rstudio-desktop/). RStudio has four main panes: Source, Console, Environment, and Packages. The Source pane is for editing code, the Console pane is for running code and seeing results, the Environment pane is for managing data and packages, and the Packages pane is for installing and managing packages. To install RedditExtractoR, run install.packages(“RedditExtractoR”) in RStudio. It’s a R package that extracts data from Reddit, including posts and comments. RedditExtractoR outputs a data frame containing columns such as timestamp, title, subreddit, and URL. For instance, you can search for posts and comments in the r/Biking subreddit to gain insights into e-bikes, road bikes, and maintenance. To utilize RedditExtractoR, extract comments from posts using the get_thread_contentfunction and save them as a CSV file. Alternatively, you can collect posts and comments from a particular subreddit based on specific keywords.
That wraps up today’s episode of The Study Guide. Remember, we teach to learn, and I hope this has helped you understand the material better. Keep studying, keep learning, and keep pushing toward your academic goals. Don’t forget to follow me on all platforms @Kingmusa428 and check out more episodes at kingmusa428.com. See y’all next time!
0 Comments