is a free document review platform. It helps you create pdf, pubmed, and general academic article reviews.  

This post is deep dive.  To get started quickly, check out the getting started post.

Sysrev super powers data extraction projects.  "Sysrevs" involve an article data source, tasks, and a reviewer team.  Reviewers execute tasks on the data source.  

The basic sysrev pipeline. A data source feeds into sysrev which manages the distribution of tasks to both humans and machines. 

Sysrev applies a few critical concepts to this process: Audits, Open Access, Human-Machine Interface, and Generality.

Audits: Sysrev records every reviewer action and makes it accessible with a link. This means that the results of any kind of document review can be audited.

Open Access: Open access reviews allow anybody to view / download pubic review data. This makes reviews more trustworthy and re-usable.  Learn more at  fair-review and proof of review. There are more than 500 open access projects on sysrev.

All the data for every article review in the 40 reviewer NIEHS project is visible to the world at Audit exactly who reviewed which articles and what data they assigned.

Human-Machine Interface: AI is making great strides in natural language processing, image recognition, and generalized classification.  Sysrev lets you teach a machine to perform a human task and scale that task to greater heights. We made the site from annotation data at was made from open access gene annotation data generated at Write a search query and see which genes are mentioned most often in the resulting pubmed abstracts. This is just one example of Human-Machine Interfaces made possible with sysrev.

Generality: Imagine a document review on a electronic health records, histological images, patents, legal documents, tweets, or any document collection. Sysrev enables generalized review of arbitrary data streams.


Much of what companies do involves curation of multiple data sources.  This curation process involves aquisition of some data, review of that data, and analytics of that data.  This can be simple (your inbox is a data source that you review daily), or quite complex (hedge funds pull, review, analyze and make high stakes decisions on many data streams).  

In medicine, doctors review electronic health records, researchers review academic articles, grant writers review grants, and professors review student assignments.

In risk assessment reviewers assess lab reports and case studies to determine the risk associated with drugs, chemicals, products etc.

Companies that work in the biochemical fields must review regulatory documents, chemical safety data sheets, news outlets, conference presentations, market reports, cost reports, and so on.  All of these review processes benefit from formalization, automated management, and automated data extraction.

  1. Formalization: Formalizing the data extraction process involves formally defining what tasks are required for each entity in a data source.  Screening is a common example.  Reviewers are asked to mark entities as relevant or irrelevant.  Formalizing this process allows for programmatic solutions and improved quality control.  Sysrev enables rigorous definition of tasks, auditing, and integration of results with external data sources.  
  2. Automated Management: Assigning labels to documents seems like a simple task, but large teams and documents quickly make these projects unmanageable. Management is one of the easiest places to leverage value of the combination of humans and machine learning.  At the most basic level, simply automating assignment and tracking completion of tasks can yield large efficiency gains.  At the more complex level human performane and concordance can be evaluated and integrated in the management process in complex ways.  
  3. Automated Data Extraction: Computational tools can help people work better with other people, but now there is a new kind of intelligence. Automating the process of enriching raw data sources can only be done when machines work together with human intelligence.  

    Sysrev already does this almost transparently during screening projects.  Human review is quietly prioritized to maximize the speed of machine learning.  This kind of hidden integration of human and machine intelligence lets technology help without letting it get in the way. Already many projects have benefited from our screening tools, and more automated extraction tools are on the way.  
Automated screening was the first sysrev machine learning tool. This algorithm quietly prioritizes human screening to maximize the performance of a trained screening model. Resulting models sometimes work very well! The above three models come from some of the completed ERAS Spine and Craniotomy projects. The x-axis gives the screening models probability of inclusion. Green bars are the number of articles that were actually included by a reviewer and red bars the number of articles actually excluded by a reviewer. These models work pretty well, but are not perfect.

Sysrev is now implementing custom data sources.  Early projects like the vitamin c cancer trials leveraged manual integration of data sources like

To get started with your own project just sign up at Read our getting started post to learn how to create a sysrev in just a 2-3 minutes.  If you liked this post subscribe!

Use the below bibtex to cite this post:

  title={What is Sysrev?},
  year={2019 (accessed <your date>)},