Serendip is an independent site partnering with faculty at multiple colleges and universities around the world. Happy exploring!

Character Building in the Digital Humanities

fawei's picture

Note: When the list of papers went up I realized I accidentally just put this in the GIST paper category to start with on the Thursday before break. Changed it to the Web Paper 2 category on Thursday 10th March. I knew I screwed up somewhere with this! If it's too far past the deadline that's fine...


Do human creations have personalities? With increasingly complex and (seemingly) intelligent technology today, this kind of question can cause some confusion and maybe, fear. Creations such as the Jeopardy winning computer Watson, with their constant improvements in artificial intelligence, have at least some of the aspects of human intelligence and decision. Even cultural icons such as Rosie the Riveter certainly portray a certain kind of mood and image. They even have specific genders (and in Rosie’s case, an unexpected gender role) attributed to them. But even with these qualities, it is difficult to closely inspect their ‘personalities’ without any interaction. It is likely that they were not made for extensive human interactions: the portrayal of both Watson and Rosie are/were controlled by their own authorities and were directed for a certain purpose. The audience has little influence on them; perhaps detracting aspects that could make them seem more human.

But what happens if the character is built by the audience?


Please listen to this video until the vocals start:



There probably doesn’t seem anything too notable about this song at first. The sound levels might not be perfect, it might seem a little generic. The voice is somewhat high pitched but it’s not too unusual for a Japanese pop song. One odd thing that takes a while to notice, though – the singer doesn’t breathe. That’s because the vocals were created with a computer.


Brief history

As some video site followers might know, the program used is known as a Vocaloid, in this case a Vocaloid ‘named’ Hatsune Miku by the software’s developers. Vocaloid programs, including a few English speaking versions, were first released in 2004 by the Yamaha Corporation, and today there are over 20 releases with either different voice samples, or additional sounds for previously released versions. There are also various fan-made voice banks for a freeware spinoff program that operates similarly to Vocaloids, called UTAU.

The program, in short, contained a set of voice recordings by a person (in Miku’s case, Japanese voice actress Saki Fujita) and the score editor allows the user to assign phonetic sounds certain tones and lengths, overall generating the vocals of a song (shown in this screenshot). Being computer controlled, vocals can be much faster or hold notes for longer than most humans can.

The intended purpose for Vocaloids is quite general. While a certain amount of music training seems necessary for composition, the interface is relatively simple and has been used by both professional musicians and beginners/hobbyists to make relatively successful songs. There is also specific vocabulary or extensive guidelines for content, at least for personal use.


Character building


A Vocaloid is a versatile tool on its own, but beyond being vocals for songs without human singers, they seem to have taken on a strange sort of life in Internet culture. The way they are seen today is very different from just a program or a picture on a box.


It is argued that Vocaloids did not gain popularity until they were given faces to match the voice – a few early Vocaloids did not have human representations, and seemed to have been aimed at more professional users. It is true that visuals seem to be a drawing factor for the Vocaloid, even beyond box. The original box art presents a relatively simple cartoon mascot, with some cyber-ized clothing and features such a headphones and lights (and often, unnatural looking hair) but the propagation of these images can be accredited to the fans art – user/fan created images receive almost as much attention as the songs do.


Unlike a fully human representations such as Rosie the Riveter, these stylized characters are not directly intended to be imitated. Their images seem to serve a reverse purpose, to make them more similar to existing humans by giving form that (in the hands of the artistically inclined) emotions and actions can be portrayed.

Genre generation

A great deal of Vocaloid songs are romantic in theme, much like the majority of songs produced today. But the idea of a computer singing has inspired several new themes that may not have occurred to (or been appropriate) for human singers.


Some like the above mesh human emotions with the technological aspects of the ‘singer.’ These songs deal with an odd mixture of human/machine anxieties such as the fear of being deleted or ‘uninstalled’ or not functioning that wouldn’t quite be expected for a human singer. In particular, this song also takes advantage of the Vocaloid’s inhuman voice possibilities (for example: singing excessively fast) which highlights the machine aspects for further contrast.



On the less serious side, there has also been an uprising of bizarre songs such as this one. Unlike previously mentioned songs, these kind are generally only possible for Vocaloids because a person would not want to sing it. Vocaloids, however only possess as much inhibition as their users (and users’ hard drive space.)  This is sometimes unfortunate in ways we will discuss later.

Additional Reception

The ways Vocaloids interact with their community and following also highlight the relationship between humans and technology. Weekly and daily ‘top lists’ for Vocaloid songs have been created and broadcast regularly on Japanese video site (and imported to Youtube.) Vocaloid albums have been released and were received well enough on Asian charts. Several composers of songs featuring Vocaloids managed to attain professional jobs in the music industry as a result. [Example: livetune, first to release a Vocaloid CD] In at least a few cases, their accompanying artists/music video animators were able to follow suit. [Example: huke, created a cartoon series out of artwork for a song]

As a result of the popularity of the product, Yamaha and developer Crypton Future Media released many more versions and voices of the software after 2007, including extension packs or ‘Appends’ to previous versions, the first being Miku. These included libraries with names such as ‘Soft,’ ‘Light’ or ‘Dark.’ The tonal variations were created, strangely enough, to add more human moods to the Vocaloid’s voice.

But perhaps one of the most extreme Vocaloid events is the creation of physical concerts. While the musicians were human, the singers were the Vocaloids themselves. How, then was a concert held for a computer program? With a setup of projectors and screens.



A Digital Humanities Endeavour?

The way Vocaloids have come to be seen, and to exist today is due to the creators, various supporting companies, users and fans all contributing to create something that is spread over several mediums. This illustrated well an idea proposed by literary critic and professor N. Katherine Hayles, the development ‘Digital Humanities.’

The goal Hayles’ concept is to integrate the traditional fields in the Humanities with new technology available, rather than putting them in opposition. Basic examples included presenting digital art alongside text (Hayles 11) or using computer code as text (13). Admittedly, when I first read Hayles’ description of the digital humanities, it was hard to conceive a Humanities project where it could be implemented to full effectiveness. Hayles even says that the method of collaboration between the digital and traditional needs to be figured out on a ‘case-by-case basis.’ I do think that the event of Vocaloids has been an example of the Digital Humanities at work on a very large scale, almost creating a human out of technology and the ideas of a large number of people.

The technology and image of the characters are not as heavily controlled by their source, unlike the earlier mentioned Watson and Rosie the Riveter. The community has as much and at times more power than the publishers. The image, songs, behavior and thus the ‘personality’ of these characters are determined by the community as much as the original cover designers, software programmers, singers who provided the voice samples or company that markets the product. As Hayles says:

‘Implementing such projects requires diverse skills, including Traditional scholarship… programming, graphic design, interface engineering, sonic art and other humanistic, artistic and technical skills. Almost no one possesses all of these skills, so collaboration becomes a necessity…’ (Hayles 9)

And so it seems, collaboration did create something better than any individual part.

As expected with a huge range of contributors in various fields, Hayles advises that peer review ‘should be re-thought, along with the institutions of authority than undergird them’ (Hayles 15.) In the Digital Humanities, peer review and critique is seen as rigorous and more detrimental and productive. One way Hayles proposes to reform peer review is to release the project for ‘open review,’ where, with the use of today’s vast communication networks, the product is received and reviewed by amateurs and experts worldwide. The focus here on the community, rather than the publishers, is more evident than ever – it ‘shifts the process of peer review from one that determines whether a manuscript should be published to one that determines how it should be received.’ (Hayles 16.)

Unfortunately, the communal control over Vocaloids is something that does not always have pleasant results. Here we can see a flaw, at least in this field, of the Digital Humanities process.


‘Open Review’ Problems

There are few restrictions to the use of Vocaloids, which may account for some of the program’s popularity. Legally, the Vocaloid is considered like an instrument and the rights to songs go to their creator, and many tend to put their songs up free for downloading. But control over generated content itself is low – although the software terms of service imply that it offensive material should be avoided, there is no way to judge the product until after it has been released to the public. While offensive material is no surprise on the Internet, and may even have some credit as social criticism, many of these songs are created purely for shock, often taking advantage of being able to use a female voice for sexual exploitation.


Similarly, while Yamaha legally owns the Vocaloid character designs, overly sexual or violent images are frequent. Extreme or pornographic images arise for reasons close to those resulting in offensive songs – easily accessed control and the sexualizing of Vocaloids with female voices and characters. It is not ideal for obvious reasons, but these examples do show how a contributing community can construct gender. It’s just a shame that it is most prominent in exploitive ways.

Robot worries

Aside from possibly worrisome content, it is understandable that there are critics who have anxieties about issues regarding the growing influence of technology. A computer controlled voice that has greater range and consistency than most singers, with songwriters and composers who often work for free, could be seen as a threat to professional (human) artists. Not all are opposed to the introduction of technology, though. In fact one musician stated that he saw use in the software for preserving the voices of aging singers.

A more prominent concern for the software publishers seems to be the opposition of, and failing sales of English Vocaloids in western countries. This could be for a number of reasons. There are vast differences between lifestyles in Japan and much of the United States that might make computer generated music more practical and cost-effective. In addition, long-spanning cultural differences may make the ideals of one difficult for the other to accept - for many Americans dislike the characters for the ‘anime’ styled art, due to many anime cartoons being overly sexualized, or representing foreigners poorly.


Concluding thoughts

Whether the results are good or bad, the group generation of character for what started out as a computer program is an interesting example of the Digital Humanities in action. If anything, it shows the scope of the projects made available with a group of contributors spread out over a variety of media and abilities. In the case of Vocaloids, a community seems to have created characters, images and even an environment where a computer program can exist, almost as a person.

Then to return to our first question, does this make the final creation’s character make something human? Vocaloids cannot talk in natural tones, and there is little emphasis on artificial intelligence or self awareness, so they are lacking in things that generally establish a ‘human.’  Yet there are humans who can’t speak without help and some who can no longer use their mental faculties – this doesn’t make them a Vocaloid.  I think there is some overlap between humans and Vocaloids the way they have come to be constructed by the community, but they are far from being pure humans. But with our increasingly complex reliance and interaction with technology, it’s hard to say any one person in the world today is a ‘pure human’ either - we are very much a mixture, so we could be more similar than it would seem.

I'll end this with the first Vocaloid song I ever thought to get. It's a remix and cover of the theme of a science fiction movie called Paprika, directed by a veteran animator who recently passed away. This version sounds nearly more human than the original. It's a mixture if anything. 









Work Cited:    

Hayles, N. Katherine. 'How We Think: Transforming Power and Digital Technologies.' Web. 2 March 2011.



EMusician: Humanoid or Vocaloid?
NY Times Music: Could I get that Song in Elvis, please?
Tokyohive: ...debate about Vocaloid artists through controversial tweets
WordNetSearch: Intelligence
Yamaha English FAQ


Liz McCormack's picture

group generation of character

 Thank you for this remarkable set of examples and analysis  of a sector of technology and art that exemplifies Hayles new "digital humanities".     The Vocaloids expanded my grip on many of her concepts and ideas--the collective creation, the need for broad spanning skill sets, the role of cultural contexts, the generative aspects of  collaborative work.  The example of the Vocaloid concert was mind bending.  I made me wonder what the live musicians thought and felt about the enterprise.

It also struck me how this creative work doesn't necessarily require personal relationships with your collaborators, one can pick up and use what's been put out there.   Is there a difference between sharing technological tools, and true collaboration?  Are there different flavors of collaboration that technologies open up, make possible?   Or is it the same kind of collaboration, just wearing a new outfit?   Can these new blends of technology and art provide new insights into cultures and history?  If we "followed the data" as Hayles implores us to do, what would we learn about the overlaps and distinctions between western and asian cultures?