Radio Station

Study Guide: Social Media Analytics & AI - Module 6

Yo, what it is! You know what it is, it’s your man Kingmusa— and welcome to The Study Guide! I'm here to break down today's class notes and help us learn together. Today we are going over Text Analytics - I (Thematic Analysis) and we will be focusing on Module 6: Text Analytics" Let's dive into our module on Text Analytics. We're exploring what it is, how it's used, and why it's important.

 

Key Concept of the Day: 

Today, we're focusing on the basics of text analytics. This includes understanding how it's used to draw meaning from written communication, the different techniques involved, and how it's applied in the real world. We'll also touch on using Voyant-Tools for analysis. Module 6 explores Thematic Analysis, a popular method for analyzing text from social media platforms. The data collection assignment deadline is next week. Thematic analysis identifies important themes in social media data. Text analytics has advanced with new techniques and specialized tools. Data extraction and analysis processes, like collecting, cleaning, and analyzing data, are crucial. The analytics cycle involves collecting, cleaning, analyzing, visualizing, and interpreting data to uncover insights. Visualization is vital for presenting large datasets. Textual data, comprising 80% of organizational data, requires careful analysis and storage. 


Social Chain, Europe’s top influence marketing agency, partners with influential individuals to promote products or brands to their followers on social media. Crimson Hexagon provides comprehensive data and insights, enabling data-driven decision-making in social media campaigns. The platform simplifies reporting, justifying conclusions, and delivering accurate, data-backed reports after campaigns, leading to better results. Crimson Hexagon offers valuable insights into client products and audience behavior. Large corporations use sophisticated software to manage vast amounts of data. Text data, crucial for social media platforms like tweets, comments, reviews, and written communication, is used in social media research. 


Social media analytics gathers and analyzes data from platforms to help businesses make decisions. Text analytics extracts meaning, themes, sentiments, and opinions from written communication. Text mining analyzes unstructured text data to find valuable information. Text analytics transforms messy text into a language that computers can understand. We analyze vast amounts of text data to find patterns and insights, akin to a treasure hunt where we uncover hidden gems in social media posts, emails, articles, and more.


Text analytics is crucial for understanding the vast amounts of text data generated online. It helps businesses gain insights from customer feedback, analyze social media trends, and make informed decisions.


Here are the main points:

  1. Text analytics involves extracting, analyzing, and interpreting hidden business insights from textual elements.
  2. It has roots in data mining, machine learning, and natural language processing.
  3. Key steps include identification and searching, parsing, cleaning, and filtering text, and analysis.
  4. Techniques include frequency analysis, keyword identification, association mining, clustering, classification, and sentiment analysis.
  5. Voyant-Tools is a free web-based text reading and analysis environment.

Text analytics, used in healthcare, law, government, and finance, analyzes market trends, identifies product issues, and compares competitors. It involves text mining, a collaborative effort between data scientists, NLP experts, and machine learning whizzes. Text analytics connects to data mining, databases, and library science, involving information retrieval, extraction, and more. Natural Language Processing (NLP) breaks down sentences, identifies parts of speech, removes irrelevant words, checks for spelling mistakes, and groups related concepts. However, it may struggle with context and human emotions, leading to inaccurate interpretations. Text preprocessing, using techniques like TF-IDF or word embeddings, simplifies text for computers. Frequency analysis identifies important words or ideas based on their appearance. Keyword identification summarizes content and highlights main topics. Data cleaning ensures accurate analysis and visualization. Association mining finds patterns for predictions and insights into user behavior. Clustering groups similar objects in unstructured data, making it easier to understand and analyze. Text clustering groups similar texts based on criteria like positivity or importance in network analysis. 

Google’s search algorithm groups web pages based on keyword relevance and displays the most relevant results first. It categorizes unstructured text using natural language processing and assigns predefined labels. Buoyant Tools, a free online tool, offers features like word clouds, text reading, highlighting, trends, summaries, context, and collocations. Voyant Tools provides features like frequency charts, context analysis, and collocation analysis. Buoyant Pools is a user-friendly tool for digital humanities students, scholars, and the general public, enabling text data loading, analysis, visualization, and exploration. It’s ideal for beginners learning text analysis, studying open access texts, and adding visualizations to research essays. Users can choose from pre-loaded corpora or upload their own text. Voyant offers visualizations like word clouds, trend charts, and a text reader for data exploration. Stream graphs show text frequency and trends across documents. Users can interact with visualizations by changing sizes, saving URLs or images, and adding them to presentations or assignments. Voyant encourages students to explore and present data creatively in educational assignments. A study examines public opinion and ethical issues surrounding ChatGPT in higher education using social media data. 

Generative AI, particularly ChatGPT, impacts education. The study aims to understand public perceptions of AI use through social media data. A mixed-methods approach revealed five main themes: authenticity, integrity, creativity, productivity, and research. Public opinion was mostly positive, but concerns about academic integrity and ethical considerations persisted. Integrating ChatGPT in education raises ethical concerns, requires digital literacy education, and impacts student learning and assessment. Analyzing tweets about ChatGPT on social media reveals its impact on higher education. Generative AI, powered by deep learning models like GANs and VAEs, revolutionizes content creation. However, ethical concerns arise regarding AI-generated content authenticity.  ChatGPT, a Generative AI, can write human-like content, assist in story creation, and engage in deep conversations. It finds applications in healthcare and finance, such as personalized medicine, drug discovery, fraud detection, and improved customer service. To address misuse and ensure responsible AI use, new rules and copyright laws are necessary. In education, generative AI personalizes learning materials, creates educational content, and enhances interactivity. Social media platforms like X serve as real-time communication hubs for expressing opinions on various topics. Public opinion on AI’s role in education varies, from excitement to skepticism. We’ll analyze tweets about ChatGPT’s impact on higher education, identifying main topics and expressing our feelings. Using a cleaned dataset of 5655 tweets from December 1, 2022, to September 2023, from public and individual accounts (excluding bots and spam), we’ll employ Leximancer for thematic analysis and Voyant Tools for visualization and analysis.

SentiStrength, a free tool, uses an algorithm to determine text sentiment. It works with multiple languages and handles informal language. It uses a list of words with scores from -5 to 5 to indicate sentiment strength. However, it’s not as accurate as some advanced models and relies on a predefined word list. SentiStrength identified five main themes: authenticity, integrity, creativity, productivity, and research. Authenticity involves creativity, originality, truthfulness, and accuracy in ChatGPT’s responses. Integrity means ethics, reliability, and respecting user privacy. Creativity is evident in ChatGPT’s ability to generate new ideas. ChatGPT simplifies tasks, boosts efficiency, and saves time, enhancing productivity. SentiStrength found that 46.6% of the text had a positive sentiment, 38.5% had a neutral sentiment, and 14.8% had a negative sentiment. Users are concerned about ChatGPT’s potential for incorrect answers, which could be problematic for students who rely on it. Voyant-tools analyzed the data and identified important words and patterns. Ethical concerns include data privacy, security, potential bias in AI-generated content, its impact on critical thinking, reliance on AI-generated answers, integration with other tools, handling complex questions, affecting academic integrity, and changing creativity in AI.  While users are optimistic about ChatGPT’s potential benefits, they also worry about misuse, bias, and its impact on academic work. Privacy concerns and bias in AI-generated content are also raised.  To address these concerns, clear ethical guidelines for AI use are essential. These guidelines cover data privacy, bias reduction, and responsible development. Educators need to understand AI, its capabilities, biases, and privacy concerns. Transparency and trust are crucial when using AI tools. Clearly explain data usage and decision-making processes to build student and faculty trust. Ethical AI usage is paramount. Monitor and evaluate AI tools for data privacy, biases, and clear ownership rules for AI-generated content. People support AI in education for personalized learning and productivity, are neutral about information sharing, and negatively view ethical frameworks and responsible use. Involve educators, policymakers, and tech developers to ensure AI integration improves learning while protecting academic values. 

The study focused on English-language tweets, potentially overlooking other languages and cultures. The dataset’s bias stems from its predominantly US-based composition. Analyzing other social media platforms could provide a broader understanding. Funded by Stephen F. Austin State University, the study explores AI’s impact on creative industries and generative AI in education. It examines ChatGPT’s influence on medical education, art creation, academic integrity, ethical leadership, the industry, higher education, AI-based learning content generation, education adaptation, and medical education applications. ChatGPT’s popularity surpasses TikTok and Instagram. We analyzed Indonesian COVID-19 pandemic tweets using Sentistrength. The study explores GANs in human-AI collaborative applications for creative industries. ChatGPT significantly impacts content creators and AI developers due to its YouTube content interactions. Li et al. (2023) analyzed Leximancer to understand social media concerns about ChatGPT in education. Lund et al. (2023) examined the ethical implications of ChatGPT and similar large language models in scholarly publishing. Nguyen (2023) explored data framing risks in big data and AI news media coverage. Öztürk and Ayvaz (2018) analyzed Twitter data on the Syrian refugee crisis. Rudolph, Tan, and Tan (2023) explored ChatGPT’s impact on traditional assessments in higher education, especially for quantitative research data in tourism. 

We’ll explore ChatGPT’s potential in research, including data generation, qualitative analysis, consumer engagement, and study aid. Wu and Yu’s 2023 meta-analysis found AI chatbots impact student learning outcomes. Wang et al. explored blockchain for risk prediction and credibility assessment in online public opinion. Vilares et al. developed a Spanish sentiment analysis tool for real-time political tweet analysis. Text analytics, a multidisciplinary field, extracts business insights from social media content. Organizations use it for business intelligence, transforming unstructured text data into quantitative data for analysis. Healthcare, government, and education use text analytics for tasks like email filtering, fraud detection, and opinion mining. Text analytics analyzes textual data using computational and humanistic approaches, delivering clear, interpretable results and actionable outcomes across various business departments.

Open-source tools like Voyant Tools analyze text data by identifying, searching, extracting, preprocessing, analyzing, and interpreting it. Identifying the appropriate text source is crucial due to the dynamic nature of social media text. Text preprocessing removes stop words, stems, lemmatizes, corrects spelling errors, and addresses textual errors. Data cleaning removes unwanted terms, nonsensical comments, and irrelevant data. Text analysis parses, cleans, and filters text, creating a dictionary of words using NLP to extract meanings. Text transformation converts text into numbers for analysis. Various text analysis techniques, such as clustering, association, classification, predictive analysis, and sentiment analysis, help find insights in the text. Frequency analysis, or term frequency (TF) analysis, quickly identifies the most important words. Keyword analysis finds common and important words and expressions for summarization and topic identification. Association analysis identifies the likelihood of item co-occurrence in documents. Clustering groups similar objects. Voyant Tools features include word frequency lists, frequency distribution plots, and KWIC analysis. Users can upload text in various formats or use sample corpora. Data visualization tools like Cirrus (word cloud) and Reader (text display with frequency highlighting) aid in analyzing data. 

Cirrus displays a word cloud of common words, adjustable by slider. Users can read words, hover over them for frequency information, search terms, and comparison. It also shows term frequency distribution plots, word trends, and toggles word visibility. Corpus statistics include document, word, vocabulary density, and average sentence length. The Collocates Tool analyzes terms near a specific keyword, showing frequently occurring terms and finding relationships and semantic connections. It can work alone or be easily embedded in other websites. Users can export the analysis results and use them in other websites. Voyant Tools, an open-source online app, analyzes digital texts using data visualization. Information professionals shared a digitized, OCR-ed corpus with Voyant Tools, which identified the corpus and found consistent keywords for complete metadata. This expedites the cataloging, archiving, metadata creation, digital humanitarianship, and social science research processes for descriptive metadata. Automated text analysis efficiently assigns subject metadata to prepare collections for research, especially in political text studies, classification, scaling, text reuse, and natural language processing. Political scientists can use automated text analysis for quantitative text analysis, especially for focused or large-volume texts. Researchers should verify results and utilize multiple tools from the automated text analysis toolkit. Voyant Tools aids cross-validation and provides multiple perspectives, but humans are still needed due to the limitations of computer understanding of context. 

Over 25 visualization formats help understand text data, but all data, including visual data, has inherent bias due to human involvement. While unbiased in data cleaning, Voyant Tools can’t eliminate user or corpus bias. This study used Voyant Tools to assign descriptive metadata to political correspondence. It analyzed text documents, identified repeated words, and visualized data to show main ideas. PDF documents from the Nunnelee collection were converted to Word documents using Adobe’s OCR software. A custom stop-word list excluded broad words and abbreviations but retained common names for metadata creators. The study focused on the Nunnelee corpus’s “aboutness,” examining common topics and subjects in letters. Data cleaning removed words like “including” and “provide,” but kept “Congress” and “Representative.” Two independent reviewers reviewed 100 random documents using Voyant Tools to ensure accuracy. They used tools like Cirrus (word cloud), Summary, Trends, Reader, Context, Collocates-Links, and Terms Berry. Cirrus showed word relationships, while Terms Berry grouped top words with different shades of pink to show usage frequency. Stop words in Cirrus revealed specific terms like “public,” “energy,” and “medicare,” related to health legislation. The Nunnelee collection focused on health legislation, with words like “act,” “health,” and “program” appearing frequently. Topics included environmental issues, energy sources, Native American rights, Medicare, and clean energy programs. Subject headings and keywords included energy policy, the US, health insurance, the Keystone Pipeline, Medicare, and the Clean Energy Act.

In a nutshell, this module introduces the core concepts of text analytics and its applications, providing a foundation for understanding how to analyze and interpret textual data. Voyant Tools is a tool for describing and analyzing text, finding patterns and connections in data. However, cleaning up data and dealing with OCR mistakes can be time-consuming. To maximize its use, use specific keywords, avoid outdated language, and ensure easy-to-understand subject headings. Also, know state abbreviations for better searching and use full place names when possible. Voyant Tools helps find main ideas, connect them to words or phrases, and show important people or things in a collection. 

Librarians and information professionals should test Voyant Tools with familiar collections and gradually apply it to unfamiliar ones to understand its effectiveness. It enhances text data analysis, not just confirms existing knowledge. Future researchers should improve usability, discoverability, and compatibility with other software and data types. Voyant Tools hasn’t created new data or analyzed new information. Graser and Burel (2018) wrote a book on metadata automation, Hendrigan (2018) wrote a paper on the convergence of digital humanities and STEM librarianship, and Lee, Kim, and Kim (2010) studied research trends in digital libraries using text mining and profiling methods. In this study, we analyzed patient experience comments from a primary care survey using Voyant Tools. The manifesto corpus is a new resource for researching political parties and quantitative text analysis. 

Voyant Tools is a popular text mining software tool for digital humanities projects. Text mining is used in literature analysis, metadata creation, and political science research. Voyant Tools and Hermeneutica are two popular text mining software tools. Text mining automates metadata generation, analyzes large datasets, and finds language patterns. Political science research is challenging with large-scale computerized text analysis. Bibliographic records can be a good research data source. This is a review of Voyant Tools, a web-based text analysis tool.

That wraps up today’s episode of The Study Guide. Remember, we teach to learn, and I hope this has helped you understand Module 6: Text Analytics better. Keep studying, keep learning, and keep pushing toward your academic goals. Don’t forget to follow me on all platforms @Kingmusa428 and check out more episodes at kingmusa428.com. See y’all next time!"


Post a Comment

0 Comments