Return-Path: 
X-Original-To: job-opps-relayxyz-outgoing
Delivered-To: job-opps-relayxyz-outgoing@cs.swarthmore.edu
Received: by allspice.cs.swarthmore.edu (Postfix, from userid 1442)
	id 9D20DF349; Thu,  3 Feb 2005 10:52:40 -0500 (EST)
X-Original-To: job-opps@cs.swarthmore.edu
Delivered-To: job-opps@cs.swarthmore.edu
From: "Charles Kelemen" 
Date: Thu, 3 Feb 2005 10:52:40 -0500
To: job-opps@cs.swarthmore.edu
Subject: [JOB OPP] [sporterfield@jhu.edu: CLSP Summer Workshop Opportunity]
Message-ID: <20050203155240.GB3033@cs.swarthmore.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.6+20040907i
Sender: owner-job-opps@cs.swarthmore.edu
Precedence: bulk
Reply-To: "Charles Kelemen" 

FYI.  Rich knows lots about this.

--cfk

----- Forwarded message from Sue Porterfield  -----

To: Sue Porterfield 
From: Sue Porterfield 
Date: Mon, 24 Jan 2005 10:25:12 -0500
Subject: CLSP Summer Workshop Opportunity
X-Original-To: cfk@cs.swarthmore.edu
Delivered-To: cfk@cs.swarthmore.edu
X-IronPort-AV: i="3.88,147,1102309200"; 
   d="pdf'?scan'208"; a="40249081:sNHT85468776"
X-MIMEOLE: Produced By Microsoft MimeOLE V5.50.4807.1700
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Importance: Normal
X-Priority: 3 (Normal)
X-MSMail-priority: Normal
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on 
	allspice.cs.swarthmore.edu
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=unavailable 
	version=3.0.2

Dear Colleague:

The Center for Language and Speech Processing at the Johns Hopkins
University is offering a unique summer internship opportunity, which we
would like you to bring to the attention of your best students in the
current junior class. Only three weeks remain for students to apply for
these internships.

This internship is unique in the sense that the selected students will
participate in cutting edge research as full members alongside leading
scientists from industry, academia, and the government. The exciting nature
of the internship is the exposure of the undergraduate students to the
emerging fields of language engineering, such as automatic speech
recognition (ASR), natural language processing (NLP), and machine
translation (MT).

We are specifically looking to attract new talent into the field and, as
such, do not require the students to have prior knowledge of language
engineering technology. Please take a few moments to nominate suitable
bright students who may be interested in this internship. On-line
applications for the program can be found at http://www.clsp.jhu.edu/ along
with additional information regarding plans for the 2005 Workshop and
information on past workshops. The application deadline is February 11,
2005.

If you have questions, please contact us by phone (410-516-4237), e-mail
sporterfield@jhu.edu or via the Internet http://www.clsp.jhu.edu/

Sincerely,
Frederick Jelinek
J.S. Smith Professor and Director


Project Descriptions for this Summer

1.  Parsing Arabic Dialects
-----------------------------------------
The Arabic language exhibits diglossia, i.e., the coexistence of two forms
of language, a variety with standard orthography and sociopolitical clout
which is not natively spoken by anyone (Modern Standard Arabic, MSA) and
varieties that are primarily spoken and lack writing standards (Arabic
dialects). To give an example from English, the contrast is similar to the
contrast between African American dialect and Broadcast American English.
The dialects and MSA form a continuum of variation at the lexical,
phonological, morphological, and syntactic levels. Our project aims at
discovering ways of parsing Arabic dialects, i.e., of automatically
determining the underlying structure of a sentence.  There are important
resources currently available for MSA with much on-going NLP work; for
example, there are several syntactic and semantic parsers for MSA. However,
Arabic dialect resources and NLP research are still at an infancy stage.
There are few written corpora available for the dialects, partly because of
the lack of standard orthographies.  There are linguistic studies of Arabic
dialectal syntax but there is no language engineering work (such as
computational grammars).  Our approach uses the MSA resources, knowledge of
the linguistics of the dialect (syntax, morphology, lexicon, phonology), and
machine learning in marshalling the MSA resources and the linguistic
knowledge.  The undergraduates on the project will be given a broad exposure
to linguistic and computational research, while working closely on a
particular problem with the senior members of the project.  Knowledge of
Arabic is not required, but interest in linguistic issues is desirable.


2.  Syntax-Driven Statistical Machine Translation
---------------------------------------------
Automatic ("machine") translation from one language to another is one of the
most difficult and most fascinating problems in computer science.  Despite
decades of research, and a great deal of progress, the output of machine
translation (MT) systems is often incomprehensible.  A number of exciting
recent advances in science and engineering, along with the burgeoning
information landscape, have made machine translation ripe for a leap
forward.  The goal of our workshop is to develop and integrate a number of
new techniques in a clean and flexible framework, in order to catalyze a
leap in MT quality.  The workshop will be very hands-on, pushing theory into
algorithms, into software, and into experimental results.  All team members,
junior and senior, will have strong analytical and programming skills.  The
diversity of topics that are relevant to MT virtually guarantees that
everyone's work will match their talents and interests, while remaining part
of a collegial group working towards a common goal.  The ability to read a
foreign language, especially French or Arabic, would be an asset, but is not
mandatory.


3.  Parsing and Spoken Structural Event Detection
----------------------------------------------
Even though speech-recognition accuracy has improved significantly over the
past 10 years, these systems do not currently generate/model structural
information (meta-data) such as sentence boundaries (e.g., periods) or the
form of a disfluency (e.g., in ?I want [to go] * {I mean} meet with Fred?,
?to go? is an edit, which is signaled by an interruption point indicated as
*, as well as an edit term ?I mean.?).  Automatic detection of these
phenomena would simultaneously improve parsing accuracy and provide a
mechanism for cleaning up transcriptions for the downstream text processing
modules.  Similarly, constraints imposed by text processing systems such as
parsers can be used to assign certain types of meta-data for correct
identification of disfluencies.

The goal of this workshop is to investigate the enrichment of speech
recognition output using parsing constraints and the improvement of parsing
accuracy due to speech recognition enrichment.  We will investigate the
following questions: (1) How does the incorporation of syntactic knowledge
affect sentence boundary and disfluency detection accuracy? (2) How does the
availability of more accurate sentence boundaries and disfluency annotation
affect parsing accuracy?  This workshop project is interdisciplinary
bringing together researchers from the speech recognition and natural
language processing communities.  The undergraduates on this project will be
exposed to research that spans these two important areas, and will gain
experience on approaches to interfacing between technologies in these two
areas.




----- End forwarded message -----
Charles F. Kelemen, Edward Hicks Magill Professor
Chair, Computer Science Department
Swarthmore College 
500 College Avenue			
Swarthmore, PA  19081 
610-328-8515   
cfk@cs.swarthmore.edu
kelemen@swarthmore.edu
________________________________________________________________________