Changes

About

6,056 bytes added, 09:46, 20 December 2018
no edit summary
=== Contents ===
 
==== Tanzil Project ====
Tanzil is a Quranic project launched in early 2007 to produce a highly verified Unicode Quran text to be used in Quranic websites and applications. Our mission in the Tanzil project is to produce a standard Unicode Quran text and serve as a reliable source for this standard text on the web. [http://tanzil.net/docs/Tanzil_Project| Tanzil Project]
The goal of this website is to provide the first online, authentic, searchable, and multilingual (English/Arabic at the individual hadith level) database of collections of hadith from our beloved Prophet Muhammad, peace and blessings be upon him.
=== Structure of Concept Tree ===[[http://corpus.quran.com/ corpus.quran.com]] an annotated linguistic resource which shows the Arabic grammar, syntax and morphology for each word in the Holy Quran.
=== Data Mining Tools ===
==== [https://en.wikipedia.org/wiki/Data_mining Data Mining] ====
This The information in this site was produced using [https://en.wikipedia.org/wiki/Data_mining|data mining], Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. The following tools were used for data mining
==== [https://www.r-project.org/ The R Project for Statistical Computing] ====
=== Wikimedia and Presentation Team ====
Special thanks to
==== [https://www.linkedin.com/in/sophivorus/ Felipe Schenone] ====
==== [https://www.linkedin.com/in/karuna-saxena-406495153/ Karuna Saxena] ====
==== [https://www.linkedin.com/in/%D0%BF%D0%B0%D0%B2%D0%B5%D0%BB-%D0%B0%D1%81%D1%82%D0%B0%D1%85%D0%BE%D0%B2-5a013089/ Pavel Astakhov] ====
=== Translation and Editing Team ===
 == Status and Progress == HodHood Islamic Encyclopedia Project is a team effort that will build and maintain a platform for all Islamic Sciences, in as many languages as possible or there is authentic relevant material. At the early stage it will form a container for all written or published books, and eventually grow the project to translate contents. The main add-value to the e-library is using data mining techniques to facilitate a better access to Islamic material for our students specifically and the English speaking Muslim community in general.  == Challenges ==The conventional search engines are for the most part general purpose search engines and may not be a good source of authentic Islamic contents. Additionally, our students may have to spend lots of time sifting thru the millions of results returned by conventional search engines like google.  Data mining techniques, are very powerful tools, however, they do expect the contents to be consistent to get the desired results. We intend to use data mining techniques to classify our contents (Quran, Hadith, Fiqh, Islamic Finance etc.) and tag the contents with appropriate classification for faster and more accurate indexing and retrieval. Data Mining Techniques will also provide a summary and visual representation of the contents to assist our student to comprehend the material and create a visual map of the topics.  === Translation ===Unfortunately, the English Islamic contents are primarily a translation from Arabic with many inconsistencies among translators. Primarily because the English language lacks some commonly used letters like “ض” or “ع” which may be translated to “A” or in some cases “U” and may be preceded by a ‘or divided into two syllabuses by a “-“. And in many cases the names may be spelled differently in the same document.  The English language also lacks lots of common Arabic terms, “خشوع”, “حياء” or “زكاة” for example.  We have built a Spell Checker utility to insure consistency among translations, or at least a produce consistent translation of Islamic Terms. The utility considers the following criteria’s for suggested spelling namely1) How the original word sound compared to how the suggested word sounds (Phonetics)2) How many operations needs to be performed before the original word can be transferred to the suggested word, commonly known in the field as “Word Distance”. 3) The Word Similarity. [[File:Spellchk.jpg|1000px|center]] to use the full version of the file table showing the Spell Checker Recommendation Utility output. Column 1 “Original Word”, Column 2 “Suggested Word”, Column 3: Original Word “Sound Like”, Column4: “Suggested Word Sound Like”. Column5: matching column 3 to column 4. Column 6: Similarity Distance between Original and Editing Team Suggested Word, column 7: Final Recommendation. ===Transliteration ===  === Dealing with Transliteration and Arabic Names ===Some of the challenges that we managed to partially resolve is the inconsistent spelling among translation for the Arabic names or Islamic terms. Islamic Material contains lots of transliteration from the Arabic language. Translators opt to transliterate in the absence of proper English translation or in cases where they cite Sacred Text like Quran or Hadith. We have manually searched for transliteration and replaced it with English translation. We developed an automated process to identify and translate the Transliteration. However, the process is not very reliable and we manually checked the material for Transliteration. The process depends on “Spell Checking” and if a word is flagged as misspelled, then that is in indication of possible transliteration. === Content Classification ===We also managed to device a process where we can automatically classify contents into topics (Content Classification). I have implemented this process in a section of the contents with very promising results. We have also managed to automatically identify association between topics and structure the topic similarity into (Association Rules).  === Voice to Text Engine ===Previously we have successfully built an indexing engine to transfer spoken words to written text. This allows for searching and indexing of media files. It also subjects media file to the Content Classification process and Association Rules processes. we have encountered many challenges in this area that made us decide to postpone this project until further notice. some of the challenges are listed here 1- The technology selected, Namely Microsoft MAVIS - supports the English language only. 2- The English Voice content we selected, or available, is not clear or of a very poor recording quality. The Quality of recording effected the MAVIS Engine produced text. 3- Most of the English material belongs to non-native English speakers with a thick accent, this made it very hard to recognize the spoken words. 4- None native English speakers also have a tendency to miss-pronounce certain English words due to their mother tongue pronunciation of words or the lack of certain English letters in there native language. 5- Islamic Media Contents also had lots of Transliterated Phrases like - سبحان الله -, this also tremendously effected the quality of the final results.  === This Wiki ===We are now building a sample web site based on Web 3.0 technology namely (MediaWiki) and incorporating the intelligence mentioned above using the built in Semantic Capabilities of MediaWiki.  We have also successfully demonstrated and used the multi language capabilities using the built in features of MediaWiki.