Yue Wang ()

I am actively finding a job! Please contact me if you believe I could fit for your position.

Email:  wangyue AT udel.edu
Phone:  +1 (302) 740-1688
Office:  209 Evans Hall, Newark, DE 19716, USA
Web:  http://www.eecis.udel.edu/~yuewang
Résumé:  A PDF version of my Résumé is here


I am a Ph.D candiate working with Prof. Hui Fang in Electrical & Computer Engineering Department at University of Delaware. My research interests include Information Retrieval and Data Mining. I received my Master degree from University of Southern Denmark.

My name is Yue Wang (王玥, wáng yuè, in Chinese). I am a sixth-year Ph.D candidate working with Prof.Hui Fang at InfoLab in ECE Department at University of Delaware. Our group works on the topics related to Information Management, such as information retrieval, knowledge base, data mining, and biomedical informatics.

I received my Master degree in the major of Electrical Engineering from Mads Clausen Institute at University of Southern Denmark. The title of my Master thesis is "Sensor Networks in Biomechatronics - Hexapod Robot", and it is under the supervise of Prof. Arne Bilberg. The project is a part of the EU-funded EMICAB project. I received my Bachlor degree in the major of Electrical Engineering from Beijing University of Technology in 2008. I also received a Bachlor degree in the major of Information Techonology from Mikkeli University of Applied Sciences in Finland.

I have a wide range of interests. Traveling, hiking and photography are my habits.


  • Yue Wang, and Hui Fang. Combining Term-based and Concept-based Representation for Clinical Retrieval. To appear in Proceedings of the 2017 Text REtrieval Conference, 2017. (TREC'17, No.7 group in 2017 TREC Precision Medicine track) []

  • Yue Wang, Peilin Yang, and Hui Fang. Evaluating Axiomatic Retrieval Models in the Core Track. To appear in Proceedings of the 2017 Text REtrieval Conference, 2017. (TREC'17, No.5 group in 2017 TREC Common Core track) []

  • Yue Wang, Hongning Wang, and Hui Fang. Extracting User-Reported Mobile Application Defects from Online Reviews. In Proceedings of 2017 SENTIRE Workshop of IEEE 17thInternational Conference on Data Mining, 2017. (SENTIRE'2017) [PDF] [Bibtex] [Slides]

  • Yue Wang, Kuang Lu, and Hui Fang. Learning2extract for Medical Domain Retrieval. In Proceedings of the 2017 Asia Information Retrieval Societies, 2017. (AIRS'2017) [PDF] [Bibtex] [Slides]

  • Yue Wang and Hui Fang. Extracting Useful Information from Clinical Notes. In Proceedings of the 2016 Text REtrieval Conference, 2016. (TREC'16) [PDF] [Bibtex]

  • Yue Wang, Xitong Liu and Hui Fang. A Study of Concept-based Weighting Regularization for Medical Records Search. To appear in the 52nd Annual Meeting of the Association for Computational Linguistics, 2014. (ACL'14, Acceptance Rate: 26.2%) [PDF] [Bibtex]

  • Yue Wang, Hao Wu and Hui Fang. An Exploration of Tie-Breaking for Microblog Retrieval. In Proceedings of the 36th European Conference on Information Retrieval, 2014. (ECIR'14) (short paper) [PDF] [Bibtex]

  • Yue Wang and Hui Fang. Exploring the Query Expansion Methods for Concept Based Representation. In Proceedings of the 2014 Text REtrieval Conference, 2014. (TREC'14) [PDF] [Bibtex]

  • Yue Wang, Jerry Darko and Hui Fang. Tie-breaker: A New Perspective of Ranking and Evaluation for Microblog Retrieval. In Proceedings of the 2013 Text REtrieval Conference, 2013. (TREC'13) [PDF] [Bibtex]

  • Yue Wang, Irene Manotas, Kristina Winbladh and Hui Fang. Automatic Detection of Ambiguous Terminology for Software Requirements. In Proceedings of the 18th International Conference on Application of Natural Language to Information Systems, 2013. (NLDB'13) [PDF] [Bibtex]

  • Miguel A. Callejas P, Yue Wang and Hui Fang. Exploiting Domain Thesaurus for Medical Record Retrieval. In Proceedings of the 2012 Text REtrieval Conference, 2012. (TREC'12, No.6 group in 2012 TREC Medical Records track) [PDF] [Bibtex]


Mining Mobile Apps for Early Bug Identification

2015.4 ~ 2017.4

  • Task: Identify sentences from a user review describing defective features in mobile apps.
  • Challenges: Given limited training resources and a review for a mobile app, verify whether the review reports a functional defect. If so, identify which sentences describe the defect.
  • Solutions: Built a hidden SVM classifier that utilizes partially annotated data sets at both sentence and review levels to identify defect reporting sentences.
  • Achievement: One of the few systems that could identify defect reporting at sentence level with limited training data.

Key Term Identification in Verbose Clinical Queries

2016.10 ~ 2017.5

  • Task: Identify the important terms in a verbose query which could be useful for document retrieval in medical domain.
  • Challenges: Abbreviations and domain specific knowledge play an important role in medical domain. Without such knowledge, it is hard to predict which term is more important than the others, especially with limited training sets
  • Solutions: Built a classifier with novel domain specific features and medical lexicon features which can be trained with only limited training instances.
  • Achievements: The proposed method could successfully identify the important terms from verbose queries.

Integrated Search System for JPMC

2014.8 ∼ 2015.11

  • Task: Develop an integrated search system with the software team at JP Morgan Chase.
  • Challenges: Integrate searching objects across different domain and identify concepts with similar semantic meanings from different resources.
  • Solutions: Build an integrated search system on top of Solr and MangoDB, which could automatically identify similar terms in each domain and convert natural language search queries into SQL style queries.
  • Achievement: One of the cross-database retrieval system which could automatically map verbose search queries into SQL queries.

Medical Domain Retrieval System

2012.9 ~ 2014.6

  • Task: Identify patients matching a set of clinical criteria based on their medical records.
  • Challenges: Correctly identify and match the clinical terms for the disease, negation handling in the natural language.
  • Solutions: Convert term based representation to concept based representation and then apply two weighting regularization methods to overcome the inaccurate mapping generated by the NLP tool.
  • Achievements: The initial system ranked 6th place out of 88 submitted systems in TREC Medical Record Retrieval Track 2012. The improved system later achieved similar performance as state-of-the-art methods in TREC 2012 using less external resources and achieving a faster processing time.

Microblog Retrieval

2013.3 ~ 2013.12

  • Task: Build a real-time ad-hoc retrieval system for tweets collection.
  • Challenges: Given tweets are shorter than normal documents, traditional retrieval signals may not work well. In addition, no future information is allowed in the system due to nature of time sensitivity of tweets.
  • Solutions: Extend the frame work of tie-breaking with query expansion and document expansion techniques.
  • Achievements: Top 3 ranked system based on the TREC Microblog Track 2012.

Software Requirement Specification Disambiguation

2011.10 ~ 2012.8

  • Task: Identify potentially ambiguous concepts in software requirement specifications.
  • Challenges: Concepts may not have a clear definition and the total number of ambiguous concepts is different from project to project.
  • Solutions: Developed an algorithm to determine ambiguity and utilized it in two feature-based information retrieval techniques to rank all important concepts based on these scores.
  • Achievements: One of the first papers that detects ambiguous terminology from software requirement specifications. Experiment results over four real-world data sets show that the proposed methods are effective.