Posted by Daniel FrankSeptember 1st, 2010

How We Did It: The Development of Trendrr v3′s Sentiment Analysis Engine

A feature we’re excited about for Trendrr v3 is a revamped sentiment analysis engine which processes the Twitter conversation. Sentiment analysis determines automatically whether a tweet expresses a positive, negative or neutral sentiment towards a brand or product. We are able to automatically determine the relative frequencies of positive, negative and neutral sentiment with accuracy greater than 90th percentile.

Our engineering team referred to Go, Bhayani, and Huang’s “Twitter Sentiment Classification using Distant Supervision” in developing our approach. Machine learning algorithms to handle this type of problem follow a basic pattern: collect some pre-categorized data (training data), and use its features to define the parameters of the model. The model takes unknown data as an input, parses the data for the same features found in the training data, and outputs its most likely category. The model’s parameters determine what this prediction will be. The more training data obtained, the more accurate the model.

Similar to the methods of Go, Bhayani and Huang, we used the presence of single words or pairs of words (e.g. whether a tweet contains the word “bad” or the phrase “not bad”) as our features, and built our model with the Maximum Entropy algorithm. The Maximum Entropy algorithm attempts to classify data by making as few assumptions as possible while respecting the observations from the training data. We gathered training data by collecting tweets containing emoticons, allowing us to obtain a large set of training data without taxing human processing. This training data is particularly useful because it is drawn from actual Twitter posts, which employ a somewhat unique vocabulary.

All of this training data, however, is taken from texts that contain opinion. To accurately identify neutral tweets we had to augment our training data, as our models were operating without ever having seen a tweet that did not express any sentiment. As such, we decided to use a technique used in sentiment classification for movie reviews called the “Hierarchical Classifier.” The classifier first determines whether a tweet is objective or subjective, and then categorizes only those tweets that express an opinion.

We made other adjustments to fine-tune our models for use with Twitter, such as accounting for URLs and usernames. We standardized all URLs and usernames (i.e. replaced them with the words URL or USERNAME) so that they would be recognized as recurring features. We also removed retweets from the training data, as these duplicates would introduce some bias.

We also added the ability to restrict our training data by language. One word may have an opposite meaning in a different language, potentially reducing the accuracy of our model. Twitter provides language metadata on its tweets, but we found it to be inaccurate. Instead, we employed another language classification technique that uses character frequency to determine the most likely language. This approach is ideal for our purposes because its accuracy is not compromised by short-form text like tweets. After making adjustments to account for Twitter-specific idioms, we were able to verify that this language classifier met our expectations.

To measure the effectiveness of our categorizer, we must compare its results with tweets that have been classified by an actual human. The simplest metric of effectiveness is to compute the percentage of tweets categorized correctly. Having classified the tweets by hand, we know the relative frequencies of each category. As such, we can compute our expected accuracy if we were to categorize tweets at random according to this same distribution. With the distribution of our test data, the expected success rate would be 38%. The model, on the other hand, correctly classifies 60% of all tweets. This represents a significant improvement over the random case.

Our users will be more interested in how accurate the classifications are to their ‘true’ average values than in the sentiment of any one given tweet. The questions we are attempting to accurately answer for our clients include: what fraction of tweets is positive, what fraction is negative, and what fraction is neutral. This relative frequency of sentiments can be viewed as a discrete distribution on one random variable taking three possible values. Thus, three relative frequencies can take any value between 0 and 1 so long as they sum to 1. Geometrically, the set of allowable relative frequencies describe a surface in the unit cube (three variables with one constraint). Specifically, the surface of the unit sphere with all coordinates positive represents the set of all allowable distributions.

What does this have to with our accuracy? The point associated with the human-measured frequencies and the point associated with the frequencies generated by our automatic classifier lie .057 apart on the surface of the sphere. Are we willing to accept this as close enough to the true frequency distribution? The set of points that are within .057 of the point for the human measured frequencies makes up less than 0.6% of the total area of the spherical section that represents all possible frequency distributions. To account for possible error in human measurement, me must consider that the true distribution may not be equal to the human measured distribution. A reasonable estimate of this error puts our classifier’s distribution at most twice this distance from the true distribution, but possibly much closer.

We can thus estimate that our categorizer has produced a frequency that is in the 97th percentile. While this exact number may depend on the data set in question, any results near this accuracy paint a clear picture of overall sentiment across the web. We find this very encouraging, and expect that our results will only improve as we expand our training data set. Furthermore, now that we have laid the groundwork in our implementation, we can easily apply the Maximum Entropy algorithm to other classification problems, and hope to extend it to deeper problems such as consumer intent.

Reference

1. Alec Go, Richa Bhayani and Lei Huang, “Twitter Sentiment Classification using Distant Supervision,” http://www.stanford.edu/~alecmgo/papers/TwitterDistantSupervision09.pdf

2. Bo Pang and Lillian Lee, “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts,” http://www.cs.cornell.edu/home/llee/papers/cutsent.home.html

Posted by Shawn SmithAugust 5th, 2010

Business Development Internship at Trendrr: Fall 2010

Trendrr is a New York-based business intelligence web service for digital and social media.

We are seeking an intern to join our startup to assist with business development, market research and product management. You will have first-hand exposure to how a fast-growing small business manages its day-to-day strategy, operations, and client acquisition.

Skills / Experience Required:

  • Strong analytical skills
  • Written and oral communication skills
  • Ability to work in a team environment and complete multiple tasks quickly and effectively
  • Self-starter combining a high-level of business acumen with strong organization skills
  • Currently enrolled in / or have a college degree
  • Big plus:

  • Prior startup experience
  • Exposure to data / statistical analysis
  • Enjoys building an innovative web application that makes customers happy
  • Role Description:

  • Client relationship management
  • Reporting and social media analysis
  • Project management within application
  • Market / competitive research
  • Hours:

    We are looking for candidates who can work at least 20 hours per week during the fall 2010 semester. Schedules can be tailored with flexibility around school and other activities. We offer a minimal daily stipend and we will work with your institution to provide college credit for your time (when applicable).

    Apply:

    Please email careers@trendrr.com with your:

  • Resume
  • Cover letter
  • Links to your blog, Twitter, and other social media accounts
  • Posted by Shawn SmithJuly 29th, 2010

    Programming/Development Internship at Trendrr: Fall 2010

    Programming Internship: Trendrr

    Social media tracking service Trendrr is seeking intelligent and highly motivated interns to be part of the Trendrr team at digital agency Wiredset for fall semester 2010. Trendrr tracks trends across a variety of data sources, ranging from social networks, to blog buzz, to Twitter tweets, to Youtube video views. Trendrr enables users to easily monitor the popularity and audience engagement around brands, TV & Film shows, political campaigns among other applications.

    Trendrr is looking looking for great programming interns. Interns will have the opportunity to work on API and other key areas of trendrr.com. We utilize cutting edge techniques including ajax and advanced Javascript. Programing Intern will be responsible for implementing and/or designing new widgets and UI elements using jquery, ajax, and facelets. Interns will also work with graphic designers on layout and design of new pages

    Interns can expect to gain experience with programs and API’s that interface with popular web services like Twitter, YouTube, Facebook and others. Interns will gain critical marketplace experience working with leading edge social and digital networks. We are a small elite team, as such, interns will be expected to contribute to the production site. We offer a unique opportunity to gain real world programming experience.

    For more information on Trendrr, visit the website: http://www.trendrr.com

    Skills/Experience Required:

    • Have significant CS coursework completed

    • Familiarity with Python

    • Have some familiarity Java and Javascript

    • Be passionate about new technology

    • Be familiar with latest web trends

    Applicable Categories for Job Function:
    Computer Drafting/Design, Computer Science/Statistics, Data/Database Management, Information Management/MIS, IT/Systems, Technical Support

    Internship is open to students and non-students for credit and is non-paid. A small transportation stipend is included.

    HOW TO APPLY:

    Please email careers@wiredset.com with:

    • A short, simple, note detailing why we should invite you over for an interview and why we should hire you as an intern. This should be concise (more than 140 characters, less than two paragraphs.)

    • Links to your social footprint (Twitter, Facebook, etc.)

    • Your resume in PDF format (please note that resumes in Word format will not be accepted).

    • Please include in subject line: “Programming/Development Internship: Trendrr”

    Posted by Shawn SmithJuly 27th, 2010

    Social Media Analytics Internship at Trendrr: Fall 2010

    Wiredset is seeking intelligent and highly motivated interns to be part of the Trendrr team at the digital agency Trendrr tracks trends across a variety of data sources, ranging from social networks, to blog buzz, to Twitter tweets, to Youtube video views. Trendrr enables users to easily monitor the popularity and audience engagement around brands, TV & Film shows, political campaigns among other applications.

    For more information on Trendrr, visit the website: http://www.trendrr.com

    For more information on Wiredset, visit the website: http://wiredset.com

    Trendrr is looking for a Social Media Analytics Intern to work in the following areas:

    • Technical Development

    • Design Marketing

    • Data analysis around campaigns + programs

    • Business Development

    • Marketplace Analysis

    • Client relationship management

    • Generation of client-facing reporting

    Skills/Experience Required:

    • Possess strong analytical skills and strong written and verbal communication abilities.

    • Keen interest and familiarity with the entertainment industry in general and TV, Film & Media specifically

    • Interest in consulting in the media/technology space

    • Experience in data/statistical analysis

    • Candidates should possess the ability to work independently and be forward-thinking

    • Potential candidates should either be currently enrolled in/or have a college degree

    Internship is open to students and non-students for credit and is non-paid. A small transportation stipend is included.

    HOW TO APPLY:

    Please email careers@wiredset.com with:

    • A short, simple, note detailing why we should invite you over for an interview and why we should hire you as an intern. This should be concise (more than 140 characters, less than two paragraphs.)

    • Links to your social footprint (Twitter, Facebook, etc.)

    • Your resume in PDF format (please note that resumes in Word format will not be accepted).

    • Please include in subject line: “Social Media Analytics Internship: Trendrr”

    Posted by Mark GhuneimJuly 26th, 2010

    Foursquare Check-In Analysis: July 11th through July 17th


    As we ready for the launch of Trendrr (V3), one of the most exciting aspects for me is the location-based analytics. We have mainly focused on location-based brand dashboards, some of which we will debut publicly on launch.

    The meta part of the same data set is an overall view into the volume, velocity and behaviors on Foursquare.

    What becomes apparent really fast when you visualize data this way is the  intense rate of adoption, volume and the deep level of engagement taking place.

    If you are a brand or company working in this space and want deeper metrics and visibility into your specific initiatives,  Trendrr might be the right tool for you.


    (larger version on click)

    Posted by Shawn SmithJuly 26th, 2010

    Web Design Internship at Trendrr: Fall 2010

    Do you live and breathe web design? Do you have opinions on what makes a well-designed product? Do you want to be a part of a cutting-edge development team and contribute real-work to a public product?

    Trendrr is a business intelligence service for digital and social media with clients ranging from influential media companies (television, music, film) to top fortune 500 brands and political organizations. We are looking for an intern to help us not only with the user interface but to develop innovative methods to visualize vast amounts of social media data.

    This is not a regular internship where you’re given menial tasks and busy work; this is an opportunity to gain authentic experience working with an agile product development team.

    Responsibilities:

    • Data visualization

    • Edit/create HTML page layouts and content

    • Edit/create CSS

    • Edit/create JavaScript

    • Graphic design

    • Image editing and optimization

    Qualifications

    Requirements:

    • Strong design skills and vision

    • Efficient with Photoshop and Illustrator

    • HTML (hand coded)

    • Basic JavaScript knowledge

    • Strong Cascading Style Sheet (CSS) skills

    • Strong knowledge of the Internet and emerging technology

    • Strong written and oral communication skills

    • Ability to work in a team environment, handle multiple tasks, and complete tasks quickly and effectively

    • Effective problem solver who can work with minimal supervision

    • Self-starter who combines a high level of creativity with strong organization skills

    • Enjoys building an innovative web application that makes customers happy

    Helpful (but not required):

    • Prior social media application development experience

    • Flash

    • jQuery (or other comparable javascript framework)

    • PHP

    • Adobe Flex

    • R

    • Canvas

    Internship is open to students and non-students for credit and is non-paid. A small transportation stipend is included.

    HOW TO APPLY:

    Please email careers@wiredset.com with:

    • A short, simple, note detailing why we should invite you over for an interview and why we should hire you as an intern. This should be concise (more than 140 characters, less than two paragraphs.)

    • Links to your social footprint (Twitter, Facebook, etc.)

    • Your resume in PDF format (please note that resumes in Word format will not be accepted).

    • Please include in subject line: “Web Design Internship: Trendrr”

    Posted by Alex MannJuly 20th, 2010

    Trendrr v3 Business Logic Infographic

    Last week, we provided an overview of the underlying system architecture supporting Trendrr v3, our revamped version of Trendrr slated to launch in early August. Today we want to share with you the business logic incorporated into the work-flow process of Trendrr v3:

    The work-flow process begins with tracking conversation and activity streams, listening to the conversation through curation and filters, measuring the conversation by refining and analyzing the streams, and finally responding to the conversation by understanding your market and targeting influencers.

    The graphic outlines many of the new features available in Trendrr v3 and how they enable users towards accomplishing the ultimate goals of our service: making money, saving money and building a brand’s equity.

    Posted by Alex MannJuly 13th, 2010

    Trendrr v3 Preview and System Architecture

    In August we are launching a completely revamped version of Trendrr with a new architecture, redesigned user experience and comprehensive work flow process. (We may have mad math and science skills but Trendrr has always been a tool born from marketing and media client needs.)

    We are launching Trendrr v3 in the next few weeks. The below graphic — a visual of the system architecture behind Trendrr v3 — represents the most impressive and significant changes we have made.

    A selection of some features (we’ll save the rest as a surprise) we are most excited for include:

  • Real-time processing with intelligence per minute (when available from source)
  • Comprehensive curation layer (Curatorr): filter by influence, location, celebrity, verified account, hashtag and links. Identify, target and optimize users for advertising, marketing and promotions
  • Abundant data sources: microblogs, blogs, sales, location-based services, search, social networks, video and more
  • In-depth real-time dashboards for Twitter, blogs and press: monitor volume, location, sentiment, gender, influence and top links
  • Detailed reporting and infographics: Instantly export social media insights into beautiful reports
  • Recommendation engine: Account level suggestions on how to act on jumps in your social media data
  • Social CRM: Understand what will grow your social customer relationships metrics with our social CRM dashboard
  • Response layer: Create response groups of influential users for social CRM, customer service and communication programs
  • Posted by Shawn SmithJuly 7th, 2010

    Trendrr Featured in the Intelligence Group’s Spring 2010 Cassandra Report

    Trendrr was featured in the Intelligence Group’s Spring 2010 Cassandra Report where our real-time dashboard was highlighted in regards to the entertainment industry. The report cited our work with Oxygen’s program The Bad Girls Club and discussed the importance of capturing, tracking and analyzing the conversation that takes place about a brand online.

    Posted by Shawn SmithJune 2nd, 2010

    Trendrr CEO on Location Based (Geo) Marketing Panel at Internet Week 2010


    As part of Internet Week, Wiredset/Trendrr CEO and founder Mark Ghuneim will be sitting on a panel with industry experts and strategists for “The Future of Location Based (Geo) Marketing” on Monday, June 07, 2010 from 5:30 PM – 7:30 PM ET

    Location based services (LBS) are all the rage these days. Is geo-location hype, web 3.0 or something in-between? What innovative ways could location awareness impact marketing and advertising? Who stands to benefit more, large brands or small business (mom & pops)? Which LBS offerings (Foursquare, Gowalla, Pegshot, Loopt, Brightkite, etc.) are most applicable for my agency or business? Join our panel of expert strategists as we explore the future of location based marketing. [Internet Week]

    Moderated by:
    Erick Schonfeld, Co-Editor, Techcrunch (@erickschonfeld)

    Panelists:
    + Ian Schafer, CEO, DeepFocus (@ischafer)
    + Mike Schneider, VP, Director Digital Incubator, Allen & Gerritsen (@schneidermike)
    + Joshua Karp, Digital Media Manager, PepsiCo (@jkarpf)
    + Mark Ghuneim, CEO, WiredSet / Trendrr (@markghuneim)

    The Future of Local Media (FLM) holds a monthly mixer focusing on people, technologies and services helping to shape the future of local media. Follow @futureoflocal for breaking news and updates.