In one form or another, reviewing documents and extracting data are important processes to any modern organization.  

Admittedly, there are a variety of document types and a near-infinite amount of data that can be extracted.  Sysrev's biggest strength is its ability to support any review of any document.  Moreover, Sysrev's built in machine-learning capabilities result in significant time savings.  This post will give step-by-step details of a Managed Review we conducted on the substance, mangiferin.  This case study is also available in video format.

Step 1: Scope

The first step in any review is to decide the purpose of the review.  For the mangiferin project, our goal was to explore mangiferin as a potential non-food additive.  Specifically, the purpose was to extract information about in vivo studies that explored the biological effects of pure mangiferin, as well as mixtures.  

Step 2: Find & Upload Initial Documents

Often times, users of Sysrev already have a corpus of documents to review.  For these cases, Sysrev supports a variety of document uploads including PDF, RIS, and Endnote XML.  For everyone else, Sysrev supports custom integrations to databases, via which users can perform meta-data or keyword based searches of the database and import directly into a Sysrev Project.  Contact us about custom integrations at info

For this project, we started with a simple PubMed search for "mangiferin".  As this search yielded just 725 articles, we decided against narrowing our search.  

Step 3: Screen Documents & Train AI

For most use-cases, the next step in any review is to Screen Documents – that is, to review the documents and decide whether or not the data within should be included or excluded from the review.  This step has several benefits including

  1. Familiarizes the project coordinators with the literature
  2. Standardizes 'inclusion criteria' for all reviewers
  3. Trains Sysrev's Inclusion Predictor - a machine-learning based model which learns from the reviewers and can predict whether or not a document will be included in the review.  
Project 20571 - 'Mangiferin In Vivo Screening' Overview Dashboard

For our mangiferin project, two Sysrev employees (myself and founder, Dr. Luechtefeld) performed the initial screening.  For time considerations, we only screened 204 of the 725 articles (51 articles were dual-reviewed) and we let Sysrev do the rest (see below.)  

Step 3: Screen Documents & Train AI

Sysrev's Inclusion-Predictor can save significant time with respect to both the Screening and Data Extraction phases of review.  As shown below, after Screening 204 articles, Sysrev's Inclusion-Predictor obtained a high confidence in its predictions for the remaining unreviewed articles.  

While perhaps not rigorous enough for a true Systematic Review, it is plenty powerful for scoping reviews or, as in our case, strategic reviews.  Put into context, our goal with the mangiferin project wasn't to learn everything learnable – it was to present evidence to a partner about mangiferin's overall potential – directing their decision of whether or not to further study its effects.  

For this reason, we utilized the Inclusion-Predictor to great effect.  

Step 4: Select Documents for Data Extraction

Once again, in a more rigorous review, we would have manually screened all 725 articles.  However, to save time, we let Sysrev's Inclusion-Predictor finish the screening.  Sysrev allows users to filter articles by a number of attributes, one of which is the Inclusion Predictor.  

Not wanting to exclude valid information, we decided include any article which the Inclusion Predictor said had at least a 25% chance of being included.  Although seemingly conservative in nature, this still drastically reduced the number of articles from 725 to 292.  To begin the data extraction process, we simply exported the filtered articles and imported them into a new project: 'Mangiferin In Vivo - Compensated Data Extraction.'

Step 5: Define Labels

Sysrev defines Labels as any sort of information to be extracted and they come in three varieties: boolean, categorial, and string.  For any one project, Sysrev supports an unlimited number of labels – though we would caution against too many labels as that can create confusion for reviewers.  For our mangiferin project, we had 14 labels.  

Sysrev Project 21696 'Mangiferin In Vivo - Compensated Data Extraction' Label Definitions Dashboard

Step 6: Data Extraction

The "final" step of any review is to extract the data.  

Instead of performing the extraction ourselves, we decided to utilize another unique feature of Sysrev: Contract Reviewers. After a brief QA session to make sure our labels were in order (18 articles total), we recruited five (paid) reviewers to extract data from the remaining articles  These reviewers consisted of scientists from various biological and chemical fields.  

As we wished to move quickly (and we were asking for a fairly complex extraction), we decided to pay reviewers $2 per article.  In less than two weeks, each of the 292 articles had been reviewed (102 articles were dual-reviewed).  


In the end, the entire mangiferin project from start to finish - project creation to extraction - took about three weeks and just a few hours of Dr. Luechtefeld's and my time.  All told, we paid $840.40 USD to have 292 (of the original 725) articles fully reviewed - from which over 3500 pieces of information were extracted.  

In this way, Sysrev is a great resource for any organization conducting a document review.  Whether the extraction is performed by internal employees or external contractors, Sysrev has the versatility (and machine-learning capabilities) to truly optimize the entire process.  

Step 7: Inform Models and Automate Extraction

The Inclusion-Predictor is not the only place where Sysrev utilizes AI.  Depending on both the extraction task and the document, we can automate large pieces of the review process.  

Moreover, we can build powerful models trained on the extracted data - as we did with and our Gene Hunter Project.  

To learn more about automating data extraction or utilizing the data produced by reviews in an efficient and optimized way, contact us at