or ¿Cuales son las Funciones del Arbolado Urbano? via Systran; For those who don’t read Spanish, Google Translate comes up with a much better version.
I’m being provocative of course…. that was the example that really jumped out at me when I was working through a text in Systran and comparing it to my own attempt at translating it. The reality is that the text produced by the programe was 95% easy to understand – it just wasn’t very English. The example in the title was the most noticeable example of a word being misunderstood or mistranslated but the sentence grammar tended to follow the Spanish original, rather than more natural English constructions. The output would have been perfectly suitable for gisting.
I hadn’t used desktop-based machine translation software before (as opposed to web-based), so it was interesting to see the options available to the user. The program allows users to add terms to build custom dictionaries to ensure that terms are translated correctly. As I am familiar with using statistics-based translation memory and terminology management packages, I was a bit surprised to see the program rate my term choices, warning me if its internal dictionary disapproved with my choices. In the limited time we had in the session, I was unable to feed many terms into the dictionary, but it would seem to have a lot of potential for a large termbase to be fed in to improve accuracy and consistency.
What could aid the program would be the ability to define the structures that are preferred in certain circumstances, this could have the potential to improve the ‘smoothness’ of the English text. This is the theory behind statistic-based machine translation. According to today’s lecture, statistical approaches have not yet proved to be any more effective than rule-based approaches, but there must be potential for a large enough corpus to provide results – I’d be interested to see if there is any research on how language can be measured statistically and whether this could be the key to the machine translation holy grail.
I suppose my own opinion follows the ‘official line’, machine translation is a reality and it will grow in capability and significance, but it can’t do all kinds of translation work. However, the need for translation in todays globalised world is too great to be satisfied by human translations. We will increasingly see genres of translation that humans will not be paid to do; human translators need to focus on the more linguistically complex texts.
I’ve spent most of the weekend working with Framemaker – the consensus seems to be that it is not the easiest programme to use, but is good for scientific and technical texts. With that in mind, I chose an article from a scientific journal to re-typeset
This was my first Framemaker Project, so I had to learn a lot of the basics, these are the steps that I went through
- First I had to define the master pages, ensuring that the pagination was set to allow me to have both left and right master pages. I added some graphics to give the article consistency and to make it appear more attractive.
- Following this step I copied in the text of the article, this had been saved as a text file to strip out the formatting of the original PDF.
- The next task was to select the fonts and create my ‘catalog’ of characters, this was straightforward enough, if slightly tedious.
- I then had to ‘paint’ the formats on to the text.
- By now it was obvious that my original 3 column layout was just too narrow for the text, so I shrank the graphics and used a two column layout, taking a centimetre from the margin of the text.
- I’ve managed to include some landscape format pages for images that are too large and detailed to fit into the main body of the text.
The next challenge is to work out how to redistribute the tables and diagrams, ensuring that they are still coherent with the text; layout and content are often closely linked.
I have recently started a module on language engineering. By that I mean the automatic processing (by a computer of natural language.
We looked at the different applications of this around. It is an important technology, and I hadn’t thought about just how widespread it is. The list of examples includes: predictive texting, video indexing (search for words within videos rather than the titles, all thanks to…. Google, of course), Machine translation, speech recognition, OCR. This is clearly a huge field… And there opportunities for employment for linguists in this area; We were given these sites to look at:
I hadn’t thought quite how much we use language engineering, in fairly everyday technology. Online translation is one of the most visible tools, and it is becoming more usable – If I’m looking at a website in a language that I don’t understand, then it’s quick and easy to click for a translation. However I hadn’t thought about how it is used in applications like predictive texting or spell checking. Obviously these applications need to have linguistic information.
What I was really taken by was Google’s Audio Indexing, which you can see here http://labs.google.com/gaudi – I suppose it’s similar to the technology in speech recognition software, applied to the audio track of video files. The potential for this in building up a spoken word corpus is huge.
In the next post on this subject we’ll have a look at the first topic, automatic (ish) term extraction…
We have started to look at Publishing Skills as part of the MSc. The logic is that as a freelance translator, it is beneficial to be able to offer DTP to your clients (at a price, naturally). The reality is that professional documents created to be printed are not likely to be produced in MS Word, so it is clearly beneficial for translators to have a grasp of using DTP software and are able to produce professional looking documents.
There are a number of software packages available. InDesign is commonly used, but expensive (£700-ish). Another programme that was suggested was Serif’s Page Plus (cheaper at £79.99). Scribus was suggested as a free open-source alternative.
The tool that we will be learning to use on the course is Adobe Framemaker, this is apparently often preferred for producing technical documents and it is considered to be particularly good at handling formulae.
Our attention was drawn to the site http://desktoppub.about.com/
Having started to use the software I can see that what is particularly important in DTP is defining and applying consistent styles. This makes consistency so much easier to maintain across the whole document and makes any changes in formatting that need to be made much simpler. A recent instance where the use of styles made a translation much simpler was a translation that I tried to feed into Trados Studio 2009 from a PDF. Trados made a stab at extracting the text from the PDF document, but the formatting did not survive in any acceptable way. I was still able to translate the text, but my final exported version looked terrible. The way I could solve the problem, was copying the text from the PDF into a Word document, define and applying styles so that the Word file matched the PDF original and then feeding the nicely formatted Word doc back into Trados. The translation memory did the rest of the work for me and managed to leave the formatting alone.
Framemaker is much more powerful in terms of defining styles than Word. I have been looking at Character tags, which can be used to highlight certain elements within paragraphs, i.e. italic, bold or underlined text; this can then be applied consistently across the whole document. Paragraph tags can then be used for different types of paragraph – Titles, quotes, bullet points etc and again, these styles can be applied across the document.
I am hoping to see how I could now apply this to the documents that I produce, and also what is ‘Good’ formatting – so I need to get reading; these are from the reading list:
I’ve written about ‘Translation in Global News’ by Esperança Bielsa and Susan Bassnett and “Translation and Conflict” by Mona Baker. Bearing these books in mind, I had a look at how the global media reported the Honduran coup in June 2009. See how many different narratives you can find: