Thursday, May 27, 2004

Summary of Song Ratings

Here is a summary of SoundRater blog ratings for songs that are currently available to be legally downloaded over the Internet.

Excellent Download Rock Attribution-NonCommercial-ShareAlike MP3 Horton's Choice

Good Download Rock Attribution-NonCommercial-ShareAlike MP3 Horton's Choice

Good Download Americana Attribution-NoDerivs-NonCommercial MP3 Christie McCarthy

Good Download Rock Attribution-NonCommercial-ShareAlike MP3 Horton's Choice

Good Download Rock Attribution-NonCommercial-ShareAlike MP3 Horton's Choice

Excellent Download Rock/Country Attribution MP3 Lisa Rein

(Rating - Link - Genre - Creative Commons License - Format - Attribution)

To download music "right-click" on the Download link and select "Save Target As..."

Tuesday, May 11, 2004

Gmail, Search & Content Discovery

This blog is concerned with the discovery of content over the Internet using types of "social network" which are beginning to appear. It is anticipated that "searches" for content will produce unique results that are specific to the tastes of the individual conducting the search, rather than providing the same result for a given a set of search criteria, whoever the searcher is.

The difference is best illustrated by considering two songs placed on the Internet that have been correctly labeled with what happens to be the same metadata (e.g. Genre:Rock, Format:MP3, Size:4.83MB etc). A search for "Rock MP3" using ordinary matching methods against this metadata should identify both songs, possibly with one having more, better quality, links to it being used to represent that collectively it is considered to be "better" content. The individual conducting the search may, however, consider one song to be "Excellent" and the other to be "Terrible". Another individual conducting the same search may consider the "Excellent" song to be "Poor" and the "Terrible" song to be "Good"! So, in practical terms, the ultimate objective of this blog is for searches to enable the discovery of content, or resources, by individuals with differing tastes which are tailored so that the results are always considered to be excellent - even by individuals with different tastes.

An underlying framework that may enable this type of search result would be a virtual network of links between peers with each link representing some form of agreement in tastes and/or trust in others' ability to identify great content. Individuals would be represented by nodes in such a virtual network with connections in this (actually these) network(s) representing relationships between peers. The local information about each node in this type of virtual network could be constructed in many ways (see SoundRatings.com for one attempt - I anticipate that blogging software will be best placed to do this, possibly along with newsreaders). Searches from different nodes, i.e. by different individuals, would be conducted "locally" using distributed search software/algorithms to match search terms against trust metrics (i.e. links) from the perspective of that node. Results from different nodes would provide unique results, even if the same search terms were used, resulting from the different location of the searcher (node) in the virtual network. I have always assumed that such search techniques would need to be fully decentralised as the computational resources required rapidly become huge.

It is, however, possible to construct a virtual network of this type using the "To:" and "From:" links in email. Interestingly, Google may have the huge search resources necessary to make such a system work, and as this blog is hosted by blogger.com I have been able to open a Gmail account whilst it is in Beta. I though to would be fun to see if the large email archive they offer (1,000MB) and associated search functions would enable me to simulate a decentralised (peer-to-peer) content discovery network of the type outlined above.

To do this I sent a number of emails from my soundrater at Yahoo email account to my Gmail account having first set up a filter to automatically archive such messages i.e. emails received from soundrater at Yahoo.com skip the inbox. This starts to build a "local" database for my node without interfering with the use of Gmail as an email account. Note: It is not really local as Google are holding the information on their servers.

Firstly, I should point out that searches are for a link to content, and not the content itself, so the body of the email contains a link - the content is not included as an attachment. This limits the size of emails to under 10k meaning that millions of ratings can be stored in each Gmail archive. In order to enable a search for content I included some elementary metadata in the body of the email as outlined below:

title: Oxygen
link: http://www.actsofvolition.com/steven/hc/hortonschoice_oxygen.mp3
creator: Hortons Choice
Rating: 87/100
Review: Excellent
genre: AlternRock
genre_id: 40
format: audio/mpeg
license: http://creativecommons.org/licenses/by-nc-sa/1.0/

It would be far better to use proper RDF, but as readers of this blog will know I have been unable to find an XML namespace that will support a rating scale of 0-100. As I don't want to get into that debate now I simply used the data outlined above. Hopefully Atom, which supports blogger and other software will support such a rating module in the future.

Having received a few "peer metadata rating" emails directly into the archive folder it was then possible to use the Gmail search functionality to search what was in effect a database of email metadata, including peer ratings, for links to resources.

First I searched according to genre in the body of emails - "Rock" resulted in other emails I had sent and received with the genre "Rock" but not "AlternRock" as listed above. I couldn't find a wildcard symbol such as "*Rock" although it may well be an option. However, "AlternRock" did pick up all the emails with this match and ranked them according to date received.

I then tried "AlternRock 87" and successfully identified the above email (other emails had ratings of 67/100 and 38/100). I couldn't find a search criteria such as ">80" to represent numerical values greater than 87 so was not able to search for AlternRock emails with a rating of greater than 80. However, as I use reviews of "Excellent" to represent ratings of 81-100, Good for 61-80, Neutral for 41-60, Poor for 21-40, and Terrible for 1-20, I was able to combine "AlternRock Excellent" to find AlternRock ratings in the range 81-100.

The next step was to simulate a user downloading the content which I simply did by "right-clicking" on the link in the body of the email, and selecting "Save Target As...". To record my rating of the content I sent a reply to soundrater at yahoo.com containing a rating figure out of 100. The reply was automatically associated with the original email in my archive folder. Using the reply functionality to record my rating of the associated resource effectively started to create my profile (and can be used for other functions such as automatically forwarding the emails to a peer of mine who has asked to be notified of any "AlternRock" email with a rating of over 70/100 created by myself, as well as many other purposes). Implicit profiling mechanisms (which do not require a user to explicitly declare a rating) have advantages too, although I consider such mechanisms to be outside the scope of this little article.

The important point is that I can use the archive folder to establish a profile for myself which would enable subsequent searching of that archive to find results that were individually tailored to my preferences. It should be noted that up to now any user with the same emails in their Gmail account searching for "AlternRock Excellent" would get the same result. The tailored search algorithms would not need to be very sophisticated as the search is only of a local database ranking the best matches against my local profile. The creation of a profile could be bootstrapped into existence by, for example, downloading or accessing a users iPod/iTunes ratings (which also use a scale up to 100).

Another feature to consider would be the recommendation by Google/Gmail of "expert raters/reviewers" based on the match of a user profile against "expert raters/reviewers" ratings to date, along with other data indicating the suitability of the rater/reviewer. This would then help to identify excellent content according to an individuals tastes as the ratings emails going into the archive would already have been pre-selected according to some rating correlation. I could go on and on, but I think the idea has been outlined.

As mentioned above, in my opinion it would appear that blogging software may ultimately be best placed to support such decentralised search features, as it is used by individuals (bloggers) who tend to be well informed opinionated people who motivated to discover content and are then are well placed to then make it available to others.

It was interesting to see at the bottom of this BBC article the following:

"As for the future, Mr Williams [Evan Williams, co-founder of Blogger] said they would be looking at incorporating Google's search technology into Blogger, offering subscribers the ability to search their blog." Date: 10 May, 2004.

Could Gmail be incorporated into such offerings? Would it be possible to search shared elements of other Gmail accounts in a peer-to-peer fashion? Also, John Battelle's analysis of Google's S-1 filing posted on boingboing.net states that:

"8. We [Google] bridge the media and tech industries (interesting), which are in flux, so we've chosen a two-class stock structure similar to the NYT, WashPost, and NYT that helps us avoid being taken over by those forces;" Date: 30 April, 2004

This suggests that Google sees parallels between its operations and those of traditional media organisations. A technical difference is that conventionally content is filtered before broadcasting, whilst on the Internet things happen the other way round. Clay Shirky has covered this in an article The Music Business and the Big Flip. The filtering challenge associated with content on the Internet is to ensure that the content an individual views is, in their opinion, excellent before they see it.

Finally, if user profiles were to be used in collaborative filtering functions the universe of other profiles against which they are matched should be as large as possible. For optimal results this implies that open systems/protocols should be used as they would enable the participation of all profiles in such discovery services, as well as enabling third parties to develop innovative features on top of the data. However, as far as this email idea is concerned, large email account providers (yahoo & hotmail) may be able to impose their own closed systems/protocols as the large number of email accounts they control could make such searches viable within their own offerings. If the Google Gmail service takes off they may also be able to introduce a closed system to enable such tailored discovery services. To leapfrog ahead of the competition with this type of email search functionality would do require them execute well and may depend on Yahoo and Microsoft not taking full advantage whilst they have a significant lead in terms of the number of email accounts held.

This page is powered by Blogger. Isn't yours?