Fuentes web
Entradas
Comentarios

TERMINOLOGIA

-Internet: El mayor sistema de redes interconectadas (o interredes) del mundo que, en todos los casos, utilizan los protocolos TCP/IP (Protocolo de Control de Transmisión/Protocolo Internet).

-Derechos: Facultades o poderes legales que se tienen o ejercen con respecto a los materiales digitales, como son los derechos de autor, la privacidad, la confidencialidad y las restricciones nacionales o corporativas impuestas por motivos de seguridad.

-Autenticidad: Garantía del carácter genuino y fidedigno de ciertos materiales digitales, es decir, de que son lo que se afirma de ellos, ya sea objeto original o en tanto que copia conforme y fiable de un original, realizada mediante procesos perfectamente documentados.

-Certificación: Proceso de evaluación del grado en que un programa de preservación cumple con un conjunto de normas o prácticas mínimas previamente acordadas.

-Protección de datos: Operaciones destinadas a resguardar los dígitos binarios que constituyen los objetos digitales de pérdidas o de modificaciones no autorizadas.

-Patrimonio digital: Conjunto de materiales digitales que poseen el suficiente valor para ser conservados para que se puedan consultar y utilizar en el futuro.

-Preservación digital: Acciones destinadas a mantener la accesibilidad de los objetos digitales a largo plazo. -Identidad de objetos digitales: Característica que permite distinguir un objeto digital del resto, incluidas otras versiones o copias del mismo contenido.

-Ingesta: Operación consistente en almacenar objetos digitales, y la documentación relacionada, de manera segura y ordenada.

-Derechos: Facultades o poderes legales que se tienen o ejercen con respecto a los materiales digitales, como son los derechos de autor, la privacidad, la confidencialidad y las restricciones nacionales o corporativas impuestas por motivos de seguridad.

-Verificación. Acción de comprobar si un objeto digital, en un formato de fichero dado, está completo y cumple con la especificación de formato

Documentación digital vs documentación “tradicional”

Decimos que una información es digital cuando está codificada en el formato que puede interpretar un ordenador y suele decirse que consiste en series de ceros y de unos.

Por otra parte, decimos que una información digital está en línea cuando es posible acceder a ella desde terminales u ordenadores remotos, a través de redes de área local, de área amplia o bien de combinaciones de ambas.

Debemos preguntarnos cuál es la diferencia concreta de la información digital en relación a las otras clases de información o a los otros tipos de soportes. Más específicamente es necesario que nos planteemos lo siguiente: ¿cuáles son las propiedades de la información digital comparada con la analógica? Se proponen tres propiedades de la información digital: computabilidad, virtualidad y capacidad.

Ya tenemos ante nosotros una primera consecuencia que se deriva de la computabilidad de la información digital: cuando se encuentra en este formato podemos realizar una serie de operaciones de búsqueda o de descubrimiento que sería imposible realizar con otra de naturaleza analógica.

El papel puede contener (de hecho es el medio idóneo) texto e imagen estática, pero no sonido ni imagen animada; por su parte las cintas de vídeo pueden contener ambos, pero son muy ineficientes para el texto o la imagen fija. El soporte digital es el único que puede contener todas las morfologías de la información. Esto nos indica que, si la web es todavía predominantemente

textual, tal cosa no durará mucho y será, cada vez más, audiovisual sin dejar de ser textual.

 

La virtualidad también tiene problemas. En primer lugar proporciona dolores de cabeza a los autores y a las empresas editoriales por la facilidad de copia. También a las bibliotecas y a los centros de documentación por las nuevas restricciones a las que deben hacer frente

con relación a la copia y la reproducción de información digital. En algunas ocasiones se ha llegado incluso a poner en duda que el derecho de propiedad intelectual tenga sentido en la Red.

 

Por otro lado, un documento digital se degrada de modo “catastrófico”. Un simple bit erróneo en un archivo de cientos de páginas o una pequeña mota de polvo que entre en contacto con la superficie de un soporte magnético, puede hacer totalmente imposible su lectura, al menos con los medios con los que cuenta un ciudadano normal.

 

Finalmente, la virtualidad hace difícil en ocasiones determinar los límites de un documento digital. Mientras los analógicos son claramente objetos discretos y tienen límites bien definidos, no sucede siempre lo mismo con los documentos digitales.

 

 

Fuentes:

http://www.msinfo.info/propuestas/documentos/documentos_digitales.html

http://dialnet.unirioja.es/servlet/articulo?codigo=1071179

http://www.elprofesionaldelainformacion.com/contenidos/2001/diciembre/5.pdf

 

Social bookmarking

Social bookmarking is a method for Internet users to share, organize, search, and manage bookmarks of web resources. Unlike file sharing, the resources themselves aren’t shared, merely bookmarks that reference them.

 

Descriptions may be added to these bookmarks in the form of metadata, so that other users may understand the content of the resource without first needing to download it for themselves. Such descriptions may be free text comments, votes in favor of or against its quality, or tags that collectively or collaboratively become a folksonomy.

 

In a social bookmarking system, users save links to web pages that they want to remember and/or share. These bookmarks are usually public, and can be saved privately, shared only with specified people or groups, shared only inside certain networks, or another combination of public and private domains. The allowed people can usually view these bookmarks chronologically, by category or tags, or via a search engine.

Most social bookmark services encourage users to organize their bookmarks with informal tags instead of the traditional browser-based system of folders, although some services feature categories/folders or a combination of folders and tags.

 

As these services have matured and grown more popular, they have added extra features such as ratings and comments on bookmarks, the ability to import and export bookmarks from browsers, emailing of bookmarks, web annotation, and groups or other social network features

Advantages

With regard to creating a high-quality search engine, a social bookmarking system has several advantages over traditional automated resource location and classification software, such as search engine spiders. All tag-based classification of Internet resources (such as web sites) is done by human beings, who understand the content of the resource, as opposed to software, which algorithmically attempts to determine the meaning of a resource. Also, people can find and bookmark web pages that have not yet been noticed or indexed by web spiders.

For users, social bookmarking can be useful as a way to access a consolidated set of bookmarks from various computers, organize large numbers of bookmarks, and share bookmarks with contacts. Libraries have found social bookmarking to be useful as an easy way to provide lists of informative links to patrons.

Disadvantages

From the point of view of search data, there are drawbacks to such tag-based systems: no standard set of keywords (i.e., a folksonomy instead of a controlled vocabulary), no standard for the structure of such tags (e.g., singular vs. plural, capitalization), mistagging due to spelling errors, tags that can have more than one meaning, unclear tags due to synonym/antonym confusion, unorthodox and personalized tag schemata from some users, and no mechanism for users to indicate hierarchical relationships between tags (e.g., a site might be labeled as both cheese and cheddar, with no mechanism that might indicate that cheddar is a refinement or sub-class of cheese). 

 

Sources:

-www.wikipedia.com

-Machine translation,  is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.  MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, attemps more complex translations , allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

-Machine aided Translation , where translation proper is performed by a computer, even if the human helps by preediting, postediting, or answering questions to disambiguate the source text. In Computer-Aided Translation, or more precisely Machine-Aided Human Translation (MAHT), by contrast, translation is performed by a human, and the computer offers supporting tools.

-Multilingual Content Management systems contain information, mostly in the form of more or less structured text documents, but potentially also including audio clips, video clips and images. Minimally, such a system provides mechanisms for storage and retrieval of content data, but it may also give support for indexing of documents, distributed document editing, version management, and generation of different views and guided tours. 

Finally…

-Translation technology is the type of technology that offers translation between two languages. It’s aim is to make simultaneous translations between oral language to another languages.  Researchers  revealed a directional speaker system that delivers a translated audio feed to just one person in a room, removing the need for them to wear headphones. And another concept device projected translated subtitles along the bottom of one lens of a modified pair of glasses.

 

 

 

 

Sources:

-Machine translation. (2008, April 7). In Wikipedia, The Free Encyclopedia. Retrieved  April 9, 2008,11.50 from http://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=203927830

-MCM Project, Multilingual Content Management, Vaxjo University (WSCC); april 10 12.00http://wscc.info/index.php?show=53044_SWE&&page_anchor=http://wscc.info/p53044/p53044_swe.php

-Kitsite, Multilingual Content Management (2007); april 10 12.05 http://www.kitsite.com/articles/multilingual-content-management.html

-Christian Boitet, 8.4 Machine-aided Human Translation; kontsulta: april 12 13.40 http://cslu.cse.ogi.edu/HLTsurvey/ch8node6.html

-Will Knight, NewScientist.com news service, Live speech-translation technology unveiled 18:05 31 October 2005; april 12 12.38 http://www.newscientist.com/article.ns?id=dn8241

 

Example of a translation from Galician to a related language as it is the Spanish:


Oito galegos secuestrados en Somalia

Catro persoas lograron acceder ao atuneiro vasco armados con lanzagranadas e manteñen retida á tripulación do ‘Praia Bakio’, composta por 13 persoas de orixe africana, oito galegos e cinco vascos. As autoridades españolas non teñen constancia de que ningún dos 26 resulte ferido durante o asalto.
A pesar de que o atuneiro sufriu danos materiais durante o asalto, os danos non impiden o seu navegabilidad e gobernabilidade e, segundo o seguimento que se lle está facendo, os primeiros indicios apuntan a que o buque diríxese cara a terra firme.


Ocho gallegos secuestrados en Somalia

Cuatro personas lograron acceder al atunero vasco armados con lanzagranadas y mantienen retenida a la tripulación del ‘Playa Bakio’, compuesta por 13 personas de origen africana, ocho gallegos y cinco vascos. Las autoridades españolas no tienen constancia de que ninguno de los 26 resulte herido durante el asalto.
A pesar de que el atunero sufrió daños materiales durante el asalto, los daños no impiden su navegabilidad y gobernabilidad y, según el seguimiento que se le está haciendo, los primeros indicios apuntan a que el buque se dirige cara a tierra firme.

As it can be seen, there is only one mistake in the translation so it can be said that it is easier to make translations with good results between related languages, on the other hand, translation between less related languages have usually typical grammatical, syntactical or word-order flaws.

Sources:

-Comprendium translator 21 april 2008, 12:57 from [(http://www.translendium.net:8080/home/text.do;jsessionid=E9A5ABD30639A486D97FC1824833274F)

-A Nosa Terra diario 21 april 2008, 12.30 from http://www.anosaterra.org/nova/oito-galegos-secuestrados-en-somalia-.html



The followings are the MT systems I have used and the translations I have made by using different web pages that offer quite a good range of possibilities to make them possible:

Instituto Cervantes. Servicio de traducción automática interactivo. Powered by AutomaticTrans.

Resultado de la traducción:
Texto original:
“Tough sentence for gun possession welcomed(1)

The Leader of Nottingham City Council has welcomed a lengthy jail sentence handed down to a man caught with a loaded gun in Nottingham.(2)

Councillor Jon Collins said he hoped it would act as a deterrent to others considering carrying guns.

Makan Dayil, 28, of Beardsley Gardens, The Meadows, was jailed for ten years following his conviction at Nottingham Crown Court for possessing the gun and ammunition, after police(3) found a revolver loaded with five bullets in his car on Queens Drive, Nottingham, on(4) August 29, last year.”

Texto traducido:

“Oración resistente para la posesión del arma dada la bienvenida

El líder del consejo de ciudad de Nottingham ha dado la bienvenida a una oración de cárcel muy larga dada abajo a un hombre cogido con un arma cargado en Nottingham.

El concejal Jon Collins dijo él esperaba que actuara como impedimento a otros que consideran el llevar de los armas.

Makan Dayil, 28, de los jardines de Beardsley, los prados, fue encarcelado por diez años que seguían su convicción en la corte de la corona de Nottingham para poseer el arma y la munición, después de que el policía encontrara un revólver cargado con cinco balas en su coche en reinas conduzca, Nottingham, de agosto el 29, el año pasado”.

-There are some obvious errors in the translation:

  1. The headline of the article has been wrongly translated:
    • “sentence” in this case does not mean “oración” but it does mean ” sentencia”.
    • “thought” in this case does not mean “resistente” “but does mean ”dura”.
    • “for” has been wrongly translated into “para” instead of “por”.

2. In the first paragraph :

  • “sentence” in this case does not mean “oración” but it does mean ” sentencia”.
    • “handed down” has been wrongly translated into “dada abajo” instead of “dada”.
    • “caught” has been wrongly translated into ”cogido” instead of “pillado”.
    • the adjective “loaded” has been translated as if it was masculine but it is femenine in this translation of “loaded gun” = “arma cargadA”.

3. In the third paragraph:

    • “police” = “el policía” has been translated as if it was a single noun but it is a colective noun “polocía” = ” La policía”.
    • The preposotion “on” has been wrongly translated into “de” instead of “el”.

 

 

 

Sources:

-Nottingham City Council “Tough sentence for gun possesion welcomed”, April 16 12:00 from [(http://www.nottinghamcity.gov.uk/news_page/news_about_nottingham_-_policing_and_public_safety_/tough_sentence_for_gun_possession_welcomed.htm)

-Instituto Cervantes- Servicio de traducción automática interactivo16 april 2008, 12:00 from http://oesi.cervantes.es/traduccionAutomatica.html

The characteristics of the translation task refers to the information flow intended for the output, from the point of view of the agent who receives the translation.

From the point of view of the FEMTI or Framework for the Evaluation of Machine Translation in ISLE the main characteristics of a translation task are these three ones:

  1. Assimilation: The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a relatively large volume of texts produced by people outside the organization, in usually several languages.
  2. Dissemination: The ultimate aim of dissemination is to deliver to others a translation of documents produced inside the organization.
  3. Communication: The purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage.

 

Sources:

-FEMTI – a Framework for the Evaluation of Machine Translation in ISLE, April 7, 12,10 from http://www.issco.unige.ch:8080/cocoon/femti/st-home.html

In this article I’ll make the asked explanation on three of the topics we have spoken about recently:

This first topic I’m going to talk about is the “Humaine” or “Human-machine interaction Network on emotions” one of the current projets of the German Research Center for Artificial Intelligence.

Humaine aims to lay the foundations for European development of systems that can register, model and influence human emotional and emotion-related states and processes – ‘emotion-oriented systems’. Such systems may be central to future interfaces, but their conceptual underpinnings are not sufficiently advanced to be sure of their real potential or the best way to develop them.

One of the reasons is that relevant knowledge is dispersed across many disciplines. Humaine brings together leading experts from the key disciplines in a programme designed to achieve intellectual integration. It identifies six thematic areas that cut across traditional groupings and offer a framework for an appropriate division of labour – theory of emotion; signal/sign interfaces; the structure of emotionally coloured interactions; emotion in cognition and action; emotion in communication and persuasion; and usability of emotion-oriented systems. Teams linked to each area will run a workshop in it and carry out joint research to define an exemplar embodying guiding principles for future work in their area.

The second topic on which I am going to focus is the one called “Whiteboard“; a completed project of the same research centre. This project focused on the “Multilevel annotation for dimamic free text processing”.

The project aimed at designing, implementing, investigating and evaluating a new system architecture that facilitated the combination of different language technologies for a range of practical applications. Language technologies offered numerous means for a partial analysis of texts that could be employed for information retrieval, information extraction, language checking, and many other applications. Processing methods and tools differed along several dimensions, e.g., wrt. levels of linguistic description, depth of analysis, or the way knowledge of language is derived (linguistically or statistically).

Methods often overlaped in their functionality but differed in their strengths and weaknesses. Finding optimal combinations of heterogeneous techniques and processing components was one of the most difficult tasks in language processing – the challenge of the Whiteboard project. The novel architecture to be developed and explored in Whiteboard was based on the concept of an annotated text. The different LT components enriched an XML. Each component can exploit or disregard previously assigned annotations. Its architecture had a single shared data structure, which at the same time was the input, throughput, and output of the system. The envisaged architecture permited the pragmatic combination of different processing approaches, most notably novel ways of the combination of shallow and deep methods.

Finally, the last topic I had picked to focus on is the “Neca” or “The net environment for embodied emotional conversational agents”; one of the previous projects of the Austrian Research Institute for Artificial Intelligence.

The objective of the NECA project was to develop a new generation of mixed multi-user / multi-agent virtual spaces populated by affective conversational agents. The agents are be able to express themselves through synchronised emotional speech and non-verbal expression, generated from an abstract representation. This is the first time that such expressive capabilities are featured in Internet applications. The agents’ usefulness were evaluated in two concrete application scenarios. From a technical point of view, the NECA platform provides a confederation of dedicated components including an affective reasoner, co-ordinated generation of verbal and nonverbal aspects of communication, and emotional speech synthesis, thus providing a basis for the development of new Internet applications with emotional agents.

Sources:

http://www.dfki.de/pas/f2w.cgi?ltp/humaine-e

http://www.dfki.de/pas/f2w.cgi?ltc/whiteboard-e

http://www.ofai.at/research/nlu/projects/nlproject_neca.html

In this article recent research topics mentioned on different sites of Human Language technologies will be pointed out.

Referring to Researc Centers, the following ones are the most remarkables:

- In the German Research Centre the following themes are the most elaborated in research:

  • exploiting – and automatically extending – ontologies for content processing.
  • tighter integration of shallow and deep techniques in processing.
  • enriching deep processing with statistical methods.
  • combining language checking with structuring tools in document authoring.
  • document indexing for German and English.
  • automatically associating recognized information with related information and thus building up collective knowledge.
  • automatically structuring and visualizing extracted information.
  • processing information encoded in multiple languages, among them Chinese and Japanese.

- The Stanford Natural Processing Language Processing group of California works in several grounds:

  • Basic research on conputational-linguistic.
  • Grammar induction.
  • Sentence understanding.
  • Word sense disanbiguation.
  • Automatic question answering.

- The Edinburgh Language Technology Group produces research in the following areas:

  • Combining Shallow Semantics and Domain Knowledge.
  • Text Mining for Biomedical Content Curation.
  • Cross-retail Multi-agent Retail Comparison .
  • Smart Qualitalive Data: Methods and Community tools for Data Mark-up.
  • Machine Learning for Named Entity Recognition.
  • Integrated Models and Tools for Fine-Grained Prosody in Discourse.
  • Joint Action Science and Technology.
  • AMI consorting projects that are developing technologies for meeting browsing and to assist people participating in meetings from a remote location.
  • Study of how pairs collaborate when in planning a route on a map (Collaborating using diagrams).

Between the most highlighted research networks the most remarkable one could be:

- The European Network of Excellence in Human Language Technologies (ELSNET) is a plataform made up to reach the following goals:

  • Make an analisys of the present and future views.
  • Share knowledge and experience.
  • Work out innovative actions.
  • Make a united enviroment examination.
  • To unite the Human Language Technologies making posible the European research and developing.

Within the associations, The Spanish society for the procesement of Natural Language it is also of great importance and analizes these themes:

  • The fixing up of lexical ambiguity.
  • Rescuing of information of great importance.
  • Linguistic technics to work with multilinguism.
  • Linguistic knowledge to make possible the semantic errors.

Finally, within the latest conferences on Natural Language Processing, I have focused on the one called “XXIV. Edition of Anual Congress of Spanish Society for the processment of Natural Language 2008 (SEPLN ‘08)”.

-The main thematic areas of this conference were:

    • Linguistic, mathematic and psicolinguistic models of the language.
    • Linguistic of Corpus.
    • Automatic translation.
    • Recognizing th voice.
    • Semantic, pragmatism and discurs.
    • PLN industrial aplications.
    • Automatic analyis of texts’s containings.

Sources:

http://www-nlp.stanford.edu/

http://www.ltg.ed.ac.uk/projects

http://www.dfki.de/lt/projects.php

http://www.elsnet.org/http://www.sepln.org/

http://www.sepln.org/

http://basesdatos.uc3m.es/sepln2008/web/

It is important to point out the importance of different research centres all over Europe so we can go deeply into Human Language Technologies.

These are four of the main research centres for Human Language Technologies of Europe I have found on the net:

- The National Centre for Language Technology (Germany) : their aim is to conduct research into the processing of human language using diferentapplications as for example, computers, speech recognition and synthesis, machine translation, human-computer interfaces, information retrieval and extraction, the teaching and learning of languages using computers and software localisation and globalisation.

- The Edinburgh Language Technology Group is a research and development group that has been working in the area of natural language engineering since the early 1990s. It was originally established as part of the Human Communication Research Centre, and is now based in the Institute for Communicating and Collaborative Systems of the Division of Informatics, University of Edinburgh, one of the largest communities of natural language processing specialists in Europe.

- Language Technology Documentation Centre in Finland: In order to make speech-to-speech translation real, some concerted European key actors in SST technologies will be undertaken, organized along the following themes: technology and service development for SST components and SST systems – development of platforms and creation of services, research in SST technologies – performance improvement of speech recognition, speech synthesis, speech centered translation, language resources (LR) for many languages – make available speech databases, corpora and lexica needed to develop SST components, to evaluate their performance and to transfer such SST-components to other languages and technology dissemination – create infrastructure to support a fast spreading SST technology.

- Language Technology Group : Language Technology (LT) forms a major research area at the Austrian Research Institute for Artificial Intelligence (OFAI) since its birth in 1984. they make research in modelling and processing human languages, especially for German. This includes constructing linguistic resources (such as lexicons, grammars, discourse models), processing algorithms (such as morphological components, parsers, generators, speech synthesizers, discourse processing components), and application prototypes (such as natural language interfaces, advisory systems and concept-to-speech systems).

The Language Technology Group at OFAI is a member of the EU’s European Network of Excellence in Human Language Technologies (ELSNET)

Sources:

  • http://www.dfki.de/lt/lt-general.php
  • http://www.nclt.dcu.ie/
  • http://www.ofai.at/research/nlu/
  • http://www.ltg.ed.ac.uk/
  • http://www.ling.helsinki.fi/filt/projects/index-en.shtml
  • Entradas antiguas »