2014年10月29日星期三

just thinking of film recommendation system


Take a glance at DOUBAN's films recommendation system
The course ’social media analytics’ is almost getting to the end. The most significant change for me is that it helped build up a conception of classification for me and also figured out how important it is. many tools we use to analyze information make a specific emphasis on classification. We need to classify the users groups, users emotional trend, users’ preference and so on to establish a integrated platform. As is known to all,  there are few popular social media apps in china such as weibo, qq zone, renren, but  little of them are derived  from our own ideas, we are used to plagiarize one’s creation even overseas and most of them happened to be also popular in China. It seems that social media plays such a significant role in our life, not only for that we are depended on wechat everyday, but also for it a necessary access for economic commerce to gain information from customers. And in last class we also learnt the movie recommend system which also implies a very important application of social media, most of us have the experience of searching for some certain of films or recorded our watching history down to record our preference, and it all provided plenty of information to form a smart recommend system to which depend on a well-designed movie grading system, knowing about the principle of apps we used frequently did make us a firm fundament to design a similar system in our project.



and we happened to learn the recommendation system in class

some basic idea of recommendation system

Basic Idea of Recommender Systems
– Use computer algorithms to filter information for the users
– Compare user preferences and item characteristics in a large scale 
In collaborative filtering, we usually have to work on the user-item matrix as shown above,the horizon and vertical axes shown items and users respectively, and the add-up matrix shows the relationship of the two, to measure how the users are into one film, we generate a specific algorithm to score the preference of one particular user.
step much more further, we extract some basic elements of one movie such as fantasy and authentic,romantic and unromantic, and formed two matrix to illustrate to what extend one movie can express these elements,and to what extent a user prefer to such specific elements as follows:
and what's the assumption? it shows a set of latent features exist and the user’s rating of an item is a linear combination of these latent features. thus we generate a simple fundament of recommendation system. and that's also a vital part in project for what we also decided to look into the recommendation system.



2014年10月17日星期五

principle of information analytics applied that widely...

social media seems absolutely a very complex network,actually it really is.hopefully we got some tools to figure out the relationship,and the significant information.There are always some useful tools for us to analyze information. During our daily life on surfing the Internet, we can hardly find such apps or websites that don’t contain any information from user. For example, when browsing the webpage of ‘sina weibo’,we can tell that everyone can leave a comment under one’s microblog even for the big famous stars if not any specific comment forbidden settings. So somehow it really makes an annoying trouble for the administers to manage all the comments made by so many people, for addressing that I suppose that sina do have very strict and effective algorithm to avoid and extract those specific annoying or sensitive words.  The idea of information analysis is applied in many fields, and does make the management of products or human resource more intelligent and smart.
according to a survey given by an institution overseas,sina microblog ranks third among social network of in China, I refer to a research on the Internet to help illustrate how the social network in sina can be generally consist of 
in the middle it comes the most influential ones in sina microblog,which usually made up by media accounts, website accounts, government accounts, celebrities accounts, and also some famous grass root accounts, we can roughly name them 'rich club'.Although Sina Weibo is very densely connected, Individuals are separated from each other but closer to the celebrities. Thus the most influential users become the core of Sina Weibo, and they are followed by the common users. As the core of Sina Weibo, they have great influence on the information diffusion on the social media website.
btw,all the data shown above was collected by python,and the above relationship of the rich club reveals that they are closed connected, which implies a similar phenomenon in other social medias regardless of home and abroad.


2014年10月3日星期五

  One months has passed by, and we have all been more familiar with our courses, our teachers, and certainly our classmates. through the discussion with each other, some basic conceptions have been more explicit. And just see what we’ve learned these days.


Sentiment Analysis and Opinion Mining 
In last lesson, we have learned sentiment analysis and opinion mining, the introduction of this class makes me feel like being a certain of psychological class, we even take a glance at how human beings exact their belief from what they think of things happened surrounded, no matter what angle we use to learn what the world actually is, there always exist two sides of information -----facts and opinions & feelings. And the latter one always have different polarity, how can we tell one from the other? Although it seems all the abstract things, we also developed some math methods to analyze them. That’s all of great use and the fundamental tools even in some famous tech company, so it’s really great fun for us to step into the world of social media.
the pic above showed a general structure of sentiment analysis system in some certain of elearning website overseas.from the pic,we can tell the goals is Enabling subjectivity and sentiment analysis for generating feedback from user generated discourse and for supporting information search in this eLearning website
  • investigate knowledge- and corpus-based methods for subjectivity and sentiment analysis
  • determine the semantic orientation and strength of the opinions
  • identify the targets of the opinions
  • identify the holders of the opinions
A specific score can be given to word
Here we introduce a very interesting pic to show how different score can be given to word in three dimension according to The SentiWordNet.
the vertical dimension showed the score of SO(subjective and objective)popularity, while the horizon axe showed the score of PN(positive and negative)popularity, what may be confused is that there is much more wide range of subjective words when comparing to objective words, and it's all got well explanation during our class. however the dictionary of sentiwordnet cannot cover all the words, and this noticeable problem is yet to be addressed by other methods.