Return-Path:X-Original-To: job-opps-relayxyz-outgoing Delivered-To: job-opps-relayxyz-outgoing@cs.swarthmore.edu Received: by allspice.cs.swarthmore.edu (Postfix, from userid 1442) id 9D20DF349; Thu, 3 Feb 2005 10:52:40 -0500 (EST) X-Original-To: job-opps@cs.swarthmore.edu Delivered-To: job-opps@cs.swarthmore.edu From: "Charles Kelemen" Date: Thu, 3 Feb 2005 10:52:40 -0500 To: job-opps@cs.swarthmore.edu Subject: [JOB OPP] [sporterfield@jhu.edu: CLSP Summer Workshop Opportunity] Message-ID: <20050203155240.GB3033@cs.swarthmore.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6+20040907i Sender: owner-job-opps@cs.swarthmore.edu Precedence: bulk Reply-To: "Charles Kelemen" FYI. Rich knows lots about this. --cfk ----- Forwarded message from Sue Porterfield ----- To: Sue Porterfield From: Sue Porterfield Date: Mon, 24 Jan 2005 10:25:12 -0500 Subject: CLSP Summer Workshop Opportunity X-Original-To: cfk@cs.swarthmore.edu Delivered-To: cfk@cs.swarthmore.edu X-IronPort-AV: i="3.88,147,1102309200"; d="pdf'?scan'208"; a="40249081:sNHT85468776" X-MIMEOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-Priority: 3 (Normal) X-MSMail-priority: Normal X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on allspice.cs.swarthmore.edu X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=unavailable version=3.0.2 Dear Colleague: The Center for Language and Speech Processing at the Johns Hopkins University is offering a unique summer internship opportunity, which we would like you to bring to the attention of your best students in the current junior class. Only three weeks remain for students to apply for these internships. This internship is unique in the sense that the selected students will participate in cutting edge research as full members alongside leading scientists from industry, academia, and the government. The exciting nature of the internship is the exposure of the undergraduate students to the emerging fields of language engineering, such as automatic speech recognition (ASR), natural language processing (NLP), and machine translation (MT). We are specifically looking to attract new talent into the field and, as such, do not require the students to have prior knowledge of language engineering technology. Please take a few moments to nominate suitable bright students who may be interested in this internship. On-line applications for the program can be found at http://www.clsp.jhu.edu/ along with additional information regarding plans for the 2005 Workshop and information on past workshops. The application deadline is February 11, 2005. If you have questions, please contact us by phone (410-516-4237), e-mail sporterfield@jhu.edu or via the Internet http://www.clsp.jhu.edu/ Sincerely, Frederick Jelinek J.S. Smith Professor and Director Project Descriptions for this Summer 1. Parsing Arabic Dialects ----------------------------------------- The Arabic language exhibits diglossia, i.e., the coexistence of two forms of language, a variety with standard orthography and sociopolitical clout which is not natively spoken by anyone (Modern Standard Arabic, MSA) and varieties that are primarily spoken and lack writing standards (Arabic dialects). To give an example from English, the contrast is similar to the contrast between African American dialect and Broadcast American English. The dialects and MSA form a continuum of variation at the lexical, phonological, morphological, and syntactic levels. Our project aims at discovering ways of parsing Arabic dialects, i.e., of automatically determining the underlying structure of a sentence. There are important resources currently available for MSA with much on-going NLP work; for example, there are several syntactic and semantic parsers for MSA. However, Arabic dialect resources and NLP research are still at an infancy stage. There are few written corpora available for the dialects, partly because of the lack of standard orthographies. There are linguistic studies of Arabic dialectal syntax but there is no language engineering work (such as computational grammars). Our approach uses the MSA resources, knowledge of the linguistics of the dialect (syntax, morphology, lexicon, phonology), and machine learning in marshalling the MSA resources and the linguistic knowledge. The undergraduates on the project will be given a broad exposure to linguistic and computational research, while working closely on a particular problem with the senior members of the project. Knowledge of Arabic is not required, but interest in linguistic issues is desirable. 2. Syntax-Driven Statistical Machine Translation --------------------------------------------- Automatic ("machine") translation from one language to another is one of the most difficult and most fascinating problems in computer science. Despite decades of research, and a great deal of progress, the output of machine translation (MT) systems is often incomprehensible. A number of exciting recent advances in science and engineering, along with the burgeoning information landscape, have made machine translation ripe for a leap forward. The goal of our workshop is to develop and integrate a number of new techniques in a clean and flexible framework, in order to catalyze a leap in MT quality. The workshop will be very hands-on, pushing theory into algorithms, into software, and into experimental results. All team members, junior and senior, will have strong analytical and programming skills. The diversity of topics that are relevant to MT virtually guarantees that everyone's work will match their talents and interests, while remaining part of a collegial group working towards a common goal. The ability to read a foreign language, especially French or Arabic, would be an asset, but is not mandatory. 3. Parsing and Spoken Structural Event Detection ---------------------------------------------- Even though speech-recognition accuracy has improved significantly over the past 10 years, these systems do not currently generate/model structural information (meta-data) such as sentence boundaries (e.g., periods) or the form of a disfluency (e.g., in ?I want [to go] * {I mean} meet with Fred?, ?to go? is an edit, which is signaled by an interruption point indicated as *, as well as an edit term ?I mean.?). Automatic detection of these phenomena would simultaneously improve parsing accuracy and provide a mechanism for cleaning up transcriptions for the downstream text processing modules. Similarly, constraints imposed by text processing systems such as parsers can be used to assign certain types of meta-data for correct identification of disfluencies. The goal of this workshop is to investigate the enrichment of speech recognition output using parsing constraints and the improvement of parsing accuracy due to speech recognition enrichment. We will investigate the following questions: (1) How does the incorporation of syntactic knowledge affect sentence boundary and disfluency detection accuracy? (2) How does the availability of more accurate sentence boundaries and disfluency annotation affect parsing accuracy? This workshop project is interdisciplinary bringing together researchers from the speech recognition and natural language processing communities. The undergraduates on this project will be exposed to research that spans these two important areas, and will gain experience on approaches to interfacing between technologies in these two areas. ----- End forwarded message ----- Charles F. Kelemen, Edward Hicks Magill Professor Chair, Computer Science Department Swarthmore College 500 College Avenue Swarthmore, PA 19081 610-328-8515 cfk@cs.swarthmore.edu kelemen@swarthmore.edu ________________________________________________________________________