olstrans Project Guide

27 October, 2000

OLS Transcription Project Team

The olstrans project (Ottawa Linux Symposium Transcription Project) seeks to provide high-quality, technically-accurate transcripts of the recorded sessions from the Ottawa Linux Symposium. This guide provides details of the methods used by this project, as well as the tools used to produce the transcripts. This guide is the primary source of information for all transcriptionists and quality auditors interested in joining this project. This is a draft document.


Distribution

Methods of distribution

Distribution of materials from this project will be done through the project website (http://olstrans.sourceforge.net) and may also be done through the Ottawa Linux Symposium website (http://www.ottawalinuxsymposium.org) and/or FTP server (ftp://ftp.ottawalinuxsymposium.org). Distribution will be done in a self-serve manner (people may access these sites to access the completed transcripts).

Notification of completed transcripts

An announcement for each completed transcript (see When is a transcript complete? , below) will be posted to the workers mailing list. A news item will be added to the project news page (http://olstrans.sourceforge.net/news.php3) and to the project news page at SourceForge. It shall be the responsibility of the project administrator to post these announcements.

Final products of this project

The final products of this project will be high-quality transcripts of the recorded sessions from OLS. The LyX format (.lyx) file shall be considered the master for all corrections, updates and submissions. Additionally, transcripts will be provided in HTML (.html), DocBook SGML (.sgml), ASCII text (.txt), PostScript (.ps) and PDF (.pdf) formats. Additional formats may be selected based on the requests received.


The Team

Transcriptionists

The first stage in producing a transcript is the transcription. People who perform this task are referred to as transcriptionists. It is the transcriptionist's job to first listen to the recorded audio, building a transcript, then to check the contents of this transcript.

Professional transcriptionists (such as those in the medical field) usually do the checking during the initial transcription. This project has a considerably different goal than most professional transcriptionists, though, whereas the resulting product is of limited quantity and professional transcriptionists are usually graded on the quantity of transcripts they produce.

Transcriptionists should have a good ear and a reasonable technical understanding of the content. One need not be an expert on the subject, but one should be able to identify common acronyms and technical terms. Major mistakes should be caught in the self-audit they perform after each transcript.

Quality Auditors

Once a transcript has been created (by someone on the transcription team), it is passed to a quality auditor, who listens to the recorded audio and follows along on the transcript. As they follow along, they look for errors (such as spelling mistakes, incorrect words and proper formatting). The QA team has the responsibility of verifying that the transcript is ready for release.

Project Administrator

The project administrator is responsible for posting completed transcripts and assisting transcriptionists and quality auditors in maintaining a high level of quality throughout the project. Any issues encountered by members of the transcription team or the QA team should be referred to the project administrator. Outside questions regarding the project should also be directed to the project administrator.

Joining the team

A guide to joining the team may be found on the olstrans project web site (in the "join us" section).


Transcription Tools

MP3 playback software

MP3 playback software is required for both transcriptionists and QA staff.

The audio recordings from the Ottawa Linux Symposium have been provided in MP3 format. In order to play these back (a task done by both the transcriptionist and the quality auditor), one needs MP3 playback software. One example of this type of software is XMMS. Ideally, the MP3 player you use should have a simple way to pause, rewind and fast forward in small time intervals (such as five seconds). XMMS provides these mechanisms; 'c' pauses, right arrow moves forward five seconds and left arrow moves backward five seconds. These features are important since there may be garbled or unintelligible segments in the recordings which you will need to replay several times in order to transcribe properly.

LyX

LyX is required for both transcriptionists and QA staff.

All transcripts for this project are produced using LyX. LyX is a document production system which outputs LaTeX files. LyX was selected as the tool for this project due to its consistent interface, easy-to-understand tutorials, ready availability, and output format. Files produced by LyX can easily be exported to DocBook SGML, PostScript, ASCII text and HTML, as well as a number of other formats.

Information about LyX may be found at http://www.lyx.org

LyX document template

Through the use of LyX's document template capability, we have created a shell document which allows the transcriptionist to save time when starting new transcripts. All transcripts should be done using the shell document, to ensure similar formatting and consistency between transcripts.

The LyX document template for this project is available at: http://olstrans.sourceforge.net/olstrans.lyx

CVS

CVS is a mechanism for storing data in a manner as that you can easily track and manipulate development work, modifications and additions. Working copies for this project are stored on the cvs.olstrans.sourceforge.net CVS server. Details regarding the CVS system employed by this project may be found at: http://olstrans.sourceforge.net/cvs.php3

Dictionaries and Spell Checking

It is recommended that transcriptionists and quality auditors use one of the many English language dictionary resources on the net to maintain a high level of quality in transcripts. The spell check features of LyX should also be used. Please verify spelling of any proper names and define any acronyms used in the transcript.


Transcription Process

The basic idea

The basic idea of making a transcript is quite simple. Listen to what the speaker says and key this content into the provided shell. Fill in the other bold/caps sections of the shell. Once the transcript has been keyed once, check it over for obvious errors, then post it for a quality auditor to check over. Follow the markup rules in the 'Markup' section of this document. Look at a completed transcript if you want to see what your finished product should look like.

Markup

Time markers

At the end of each paragraph within the body of this transcript, a time offset is listed, corresponding to that point in the MP3 recording of the presentation. This time marker is emphasized (in document formats in which emphasis is supported) and is placed within brackets at the very end of each paragraph. For example, [05m, 30s] states that this paragraph ends at the five-minute, thirty-second mark in the MP3 recording.

Questions and comments from the audience

These recordings were created using a bud microphone attached to the speaker during their presentation. Due to the inherent range limitations of this type of microphone, some of the comments and questions from the audience are unintelligible. In cases where the speaker repeats the audience question, the question shall be omitted and a marker will be left in its place. Events which happen in the audience shall be bracketed, such as: [Audience applauds.]

Further, in cases where the audience comments or questions are not repeated by the speaker, they shall be included within this transcript and shall be enclosed within double quotes to delineate that the statements come from the audience, not from the speaker.

Editorial notes

The editor of this transcript, the transcriptionist (if you will), and the quality assurance resource who have examined this transcript may each include editorial notes within this transcript. These shall be placed within brackets and shall begin with 'ED:'. For example: [ED: The author is referring to sliced cheese, not grated cheese.]

Paragraph breaks

The paragraph breaks within this transcript are very much arbitrary; in many cases they represent pauses or breaks in the speech of the speaker. In other cases, they have been inserted to allow for enhanced clarity in the reading of this transcript.

Speech corrections by the speaker

During the course of the talk, the speaker may correct himself or herself. In these cases, the corrected speech will be placed in parenthesis. The reader of this transcript may usually ignore the parenthised sections as they represent corrected speech. For example: My aunt once had (a dog named Spot, sorry) a cat named Cleopatra.

Unintelligible speech

In sections where the speech of the author or audience has been deemed useful, but unintelligible by the transcriptionist or by the quality assurance resource, a marker will be inserted in their places, [unintelligible]. Several attempts will be made to correct words and phrases of this nature. In cases where the unintelligible words or phrases are clearly not of importance to the meaning and understanding of the sentence, they may be omitted without marker insertion.

Using the document template

In order to use the provided LyX document template, execute LyX and select the File -> New from template menu option. Within the file dialog, select olstrans.lyx , the template provided for this project.

Bold/caps sections of the template are those which the transcriptionist is expected to fill in. Speaker name, speaker bio, session abstract and scheduling details are available from the OLS website.

Transcript document layout

The document layout should be preserved from that of the template. Transcripts have four basic sections: the preamble (the title, speaker name, date and abstract), the introduction and overview (section 1, including the explanation of the markup, transcript licensing and information about the transcriptionist), the transcript, and the resource listing.

Checking your work

When checking your work, it is often best to listen through the entire session a second time. The process used during the initial set of transcripts was: transcribe, spell check, fix markup (split paragraphs as needed); listen to the session a second time, inserting time markers and making corrections as needed.

Submitting your work

Project files for the olstrans project are split in to four categories, each referring to a subdirectory under CVS for this project: in-progress transcription, transcripts pending QA, QA work files, and completed transcripts. As a transcriptionist works, they are welcome to place files in the transcription directory for safe keeping. Since it may take several sittings to finish the initial transcript, storing the files off site may help you recover from catastrophic failure. Once a transcript has been finished by the transcriptionist, it is moved to the pending-qa directory. Quality auditors work move these files to the qa-work directory once they have notified the list of their intent to QA that session. Finally, once the QA process is finished, the file is moved to the completed directory, which is the storage place for content ready for release. After release, the completed copies may continue to be updated and be re-released as errors may be found.

Persons not comfortable with using CVS may forward their content to the project administrator, who will gladly commit changes as requested.

When is a transcript complete?

A transcript is considered complete once it has been moved to the completed directory. Once in the completed directory, the transcript may be reviewed by the project administrator, then be released officially. Uncompleted transcripts should not be released to the general public. Completed transcripts will become available from the olstrans project download page.

How soon do I need to be finished with this?

Since this is a volunteer project, we have no time requirements. Please feel free to work at your own pace. However, as the end of the current transcription and QA work approaches, the project administrator may elect to reassign a task if no changes have been posted to CVS for that task within the last month. This policy helps to prevent the project from being held up by a single person. Claiming work (posting on the workers list) is stating your intent to do the work you have claimed; please try to work steadily on the claimed task until it is completed.

Contacting the speaker

Please do not contact the speaker directly regarding the transcript. Please notify the project administrator that you feel you have questions for the speaker and the project administrator will act on your behalf, contacting the speaker or finding adequate answer to the question in some other fashion. This policy helps to prevent annoyance of the speaker and give the olstrans project a more unified public image.


Quality Assurance

Reporting bugs

If you are involved in the QA process, you may submit your results one of two ways. First, you may make changes directly to the master file under CVS if you have obtained CVS access for this project. Second, you may post issues to the bug tracking system. Issues posted to the bug tracking system will usually be handled within four days (96 hours) from time of posting. If you wish to propose specific corrections, please post them to the patch manager.

Please be thorough in your QA work. If you notice a word or phrase that appears to be amiss, please consider it a bug and report it using the bug tracking system. The transcriptionist or project admin will gladly examine this potential bug and determine whether the transcript is, indeed, incorrect.

How should markup issues be reported?

Markup issues (incorrect document formatting in one of the LyX masters) should be reported using the bug tracking system.

How should conversion issues be reported?

Conversion issues (bugs present in a format other than LyX whereas the LyX master appears to be correct) should be reported using the bug tracking system; these issues should be flagged with a high priority.

Preserving misprints in direct quotations

The content in the speaker bio and the abstract for each of the talks is taken verbatim from the OLS website. This content, even though it may have errors, will not be altered. This project uses a policy similar to that of a number of other projects which include verbatim copies of web content; we do not modify direct copies of content so as that the original site may be more easily found using a search engine.

How soon do I need to be finished with this?

Since this is a volunteer project, we have no time requirements. Please feel free to work at your own pace. However, as the end of the current transcription and QA work approaches, the project administrator may elect to reassign a task if no changes have been posted to CVS for that task within the last month. This policy helps to prevent the project from being held up by a single person. Claiming work (posting on the workers list) is stating your intent to do the work you have claimed; please try to work steadily on the claimed task until it is completed.

Contacting the speaker

Please do not contact the speaker directly regarding the transcript. Please notify the project administrator that you feel you have questions for the speaker and the project administrator will act on your behalf, contacting the speaker or finding adequate answer to the question in some other fashion. This policy helps to prevent annoyance of the speaker and give the olstrans project a more unified public image.


Questions

Questions or comments regarding this project should be directed to the project administrator, Jacob Moorman


This Document

History

This document was first posted on 24 October, 2000 and was originally written by Jacob Moorman.

* 27 October, 2000: 12 minor modifications designed to improve clarity.

* 26 October, 2000: 3 minor modifications designed to improve clarity.