MACHINE NATURAL LANGUAGE TRANSLATION USING WIKIPEDIA AS A PARALLEL CORPUS: A FOCUS ON SWAHILI

  • Type: Project
  • Department: Management
  • Project ID: MGT0079
  • Access Fee: ₦5,000 ($14)
  • Pages: 114 Pages
  • Format: Microsoft Word
  • Views: 1K
  • Report This work

For more Info, call us on
+234 8130 686 500
or
+234 8093 423 853

The government of Kenya has undertaken an ambitious project to equip children with laptops and tablets for the purposes of facilitating electronic based learning. This initiative can only bear fruit provided that there is content relevant to the studies being undertaken. Many Kenyans learn English as a second language. Swahili or other African languages is the mother tongue. Therefore, with content in Swahili, a better and deeper understanding of subject matter takes place. Much of the academic content already exists albeit in English. Therefore, translating this content is the most practical method of getting the content in Swahili. This is especially so since the content is not necessarily new, but just needs to be interpreted.

There already exist machine translation engines, such as Microsoft Translator and Google Translate, which aim to make this task easier. However, African languages are generally under-represented in these engines. The translation results they produce are comparatively inaccurate when it comes to translating content to African languages. They are even more inaccurate when translating academic type of content. This can largely be attributed to the source of data used to train the translation engines. Many machine translation engines make use of corpora made up of phrases that are found in every day speech, into which academic terms are not adequately incorporated.

Wikipedia, an on-line crowd sourced encyclopedia, offers very good sources of data for purposes of translation works.  This study has shown that using Wikipedia as  a corpus can provide a viable source of data for academic related translations and specifically so when it comes to African languages.

Therefore, this project modeled an English to Swahili translation engine that uses Wikipedia as a source of translation corpus data. As an emphasis, this study did not set out to create yet another translation engine altogether, but to just improve on, and complement, a small aspect of the current existing engines. The approach that was used was to compare same language articles in Wikipedia and build a parallel corpus which is then used to create a translation database. It is worth noting that Wikipedia on its own cannot provide a comprehensive data set for

any machine translation engine. As proof of concept this model shows English to Swahili translations and presents preliminary results here. Indeed, further work is required for more accurate output alignment and combining the output to ensure fluency and accuracy.

This study was further motivated by the directive of the Communications Authority of Kenya that aims towards having at least 60% of the media content being local. This content therefore needs to be translated into local languages for presentation purposes. The study proposes a solution that can be scaled to learn and translate other local languages.

Finally it is worth noting that Kenya, like many other developing countries, imports numerous products from foreign countries. Many of these products have their labels and instructions written in these foreign languages, more-so English. This poses a potential threat to consumers who do not understand these languages for example in the case of medical drugs. 

MACHINE NATURAL LANGUAGE TRANSLATION USING WIKIPEDIA AS A PARALLEL CORPUS: A FOCUS ON SWAHILI
For more Info, call us on
+234 8130 686 500
or
+234 8093 423 853

Share This
  • Type: Project
  • Department: Management
  • Project ID: MGT0079
  • Access Fee: ₦5,000 ($14)
  • Pages: 114 Pages
  • Format: Microsoft Word
  • Views: 1K
Payment Instruction
Bank payment for Nigerians, Make a payment of ₦ 5,000 to

Bank GTBANK
gtbank
Account Name Obiaks Business Venture
Account Number 0211074565

Bitcoin: Make a payment of 0.0005 to

Bitcoin(Btc)

btc wallet
Copy to clipboard Copy text

Details

Type Project
Department Management
Project ID MGT0079
Fee ₦5,000 ($14)
No of Pages 114 Pages
Format Microsoft Word

Related Works

The government of Kenya has undertaken an ambitious project to equip children with laptops and tablets for the purposes of facilitating electronic based learning. This initiative can only bear fruit provided that there is content relevant to the studies being undertaken. Many Kenyans learn English as a second language. Swahili or other African... Continue Reading
Abstract Fossil Fuels are currently classified as some of the leading producers of greenhouse gases which are major agents of Global warming. This study establishes the benefits Liquified Natural Gas (LNG) would have when used as a fuel in a hybrid-electric vehicle. The study establishes a Parallel-Hybrid vehicle model equipped with a control... Continue Reading
ABSTRACT This was a prospective cross-sectional survey study conducted in the Radio diagnostic Department of National Hospital Abuja from June 2013 to January 2014 on 210 cases (111 males and 99 females). The specific objectives were to determine the: (i) biometric values of corpus callosum in an adult Nigerian population, (ii) differences in... Continue Reading
                         ABSTRACT Machine Translation system is an automated system that translates text from a source language to target language. The source language is the main language upon which the target language is derived, while target language is the semantic equivalence of the source language. The source language and target... Continue Reading
ABSTRACT Yoruba language is gradually going into extinction because most speakers don't know how to write it despite that it is being taught in Primary and Secondary schools in Nigeria. This therefore call for the need of modern day processing tools such as machine translators for the language to catch up with the technological growth the world... Continue Reading
                         ABSTRACT Machine Translation system is an automated system that translates text from a source language to target language. The source language is the main language upon which the target language is derived, while target language is the semantic equivalence of the source language. The source language and target... Continue Reading
Topic: A Comparative Study of Affixation Processes in Swahili and Hausa Languages,  is a research project compiled by Usamatu Suleiman Maiyama a student from Usmanu Danfodiyo University Sokoto, Nigeria. The research is at aiming to findout the possible distinctions,  relationship and similarities there present in both languages (Swahili and... Continue Reading
Topic: A Comparative Study of Affixation Processes in Swahili and Hausa Languages,  is a research project compiled by Usamatu Suleiman Maiyama a student from Usmanu Danfodiyo University Sokoto, Nigeria. The research is at aiming to findout the possible distinctions,  relationship and similarities there present in both languages (Swahili and... Continue Reading
ABSTRACT Current GPUs have many times the memory bandwidth and computing power compared to CPUs. The difference in performance is getting bigger as the evolution speed of the GPUs is higher than of the CPUs. This make it interesting to use the GPU for general purpose computing (GPGPU). I begin by looking at the architecture of the GPU, and some... Continue Reading
ABSTRACT Current GPUs  have  many times the memory bandwidth and  computing power compared to CPUs.   The  difference in  performance is getting bigger as  the evolution speed  of the GPUs  is higher than of the CPUs. This  make it interesting to use  the GPU  for general purpose computing (GPGPU). I begin  by looking  at the... Continue Reading
Call Us
whatsappWhatsApp Us