Reinventing AI Research & Development: Part VI
In this article
In the previous article of our Reinventing AI R&D series, we recapped the developments made by the AI R&D Operations rotation with our team processes, MLOps proof of concept, new AI projects and new publications, and hinted at some potential applications that GPT-3 could have on the business. Extending the work of the previous rotation, we have continued to follow the updated and more efficient processes developed during that rotation, which include a new peer review process and an updated weekly meeting format. As our team continues to grow and mature, we will continue to leverage new ideas and concepts for knowledge sharing and team engagement.
Throughout our time on the latest AI R&D rotation, we focused on fostering new, innovative ideas and increasing our knowledge sharing and collaboration across teams. This has led to an exciting, sharp uptick in engagement from other teams including Digital, IT and Cloud. We are excited to continue leveraging more areas of the business and expanding the breadth of our applications within the company.
This article spotlights our primary objectives from the past few months, including:
- Developing a new project and application of artificial intelligence related to image analysis.
- Publishing new articles and insights onto the platform.
- Establishing MLOps Tiger Teams (teams of specialists focused on specific goals).
- Applying for access to the GPT-3 API from OpenAI.
- Ramping up the University Relations program and machine learning (ML) in cancer research.
- Plans for Q1 2021.
Project focus during our time on the AI R&D Ops rotation
Magnetic Resonance Imaging (MRI) tumor identification and reverse image look-up
When WWT data scientists are on an AI R&D rotation, they work in pairs on a project chosen by the Project Selection Panel (PSP). The R&D Operations team collaborates closely with the data scientist pair to push the chosen project forward.
Since August, multiple data scientist rotations have contributed to the same AI R&D project, MRI Tumor Identification and Reverse Image Look-up. The project's goal is to use AI to rapidly identify problematic areas (e.g. tumors) and supply physicians with similar cases to aid in diagnosis and suggested treatment options.
Labeling and identifying tumors requires years of radiology training, but the data science team was able to utilize the Brain Tumor Segmentation (BraTS) dataset. This dataset originates from a competition out of the University of Pennsylvania and provides a dataset of labeled brain tumors by neuroradiologists. A combination of You Only Look Once (YOLO) and Autoencoder (AE) AI techniques provide tumor detection and similar patient cases.
The current data science rotation is wrapping up this work. Stayed tuned for a white paper soon!
New publications
The newly formalized peer review process is now in full swing. We have reviewers volunteering with a wide set of expertise including business strategy, software engineering, ML infrastructure engineering, data science and human-centered design. The papers below successfully flowed through the pipeline to reviewers in the relevant categories.
- An Ensemble Approach to Data Mining for Real-Time Information Retrieval
- Getting Started with MLOps: For Data Scientists
Continuing development on MLOps
Our previous article discussed the creation of an MLOps proof of concept (POC), an ambassador group and a training platform. The ambassador group was created to promote training and learning regarding MLOps and expand the ambassador network through offerings such as the MLOps Platforming Workshop. This rotation witnessed the formation of MLOps Tiger Teams — a team of specialists formed to work on specific goals. The purpose of the four Tiger Teams is to:
- Design learning journeys: Identify & recommend learning resources​, assess & test learning options, record learnings/roadblocks​ and design persona-based learning paths.
- Understand the market and provide thought leadership: Lead market research, attend MLOps conferences and write white papers and articles.
- Engage clients and developing the offering: Design a bleeding-edge offering, maintain content based on lessons learned, prepare teams for business development conversations and share content with internal teams.
- Explore technical solutions: Compare and evaluate MLOps tools, deploy models using MLOps platform, maintain knowledge management repository and develop a point of view on effective implementation.
The Tiger Teams met for the first time in October. We expect MLOps to move on from being an AI R&D initiative and in the future become embedded in our normal service offerings. In the last few months, the program has focused on finding ways to deploy models using MLOps pipelines which will reflect in future white papers. Our focus will be to present and explore opportunities with clients who are interested in MLOps.
Current progress and business use cases for GPT-3
GPT-3 is known as the largest language model ever created. Developed by OpenAI, GPT-3 is a natural language processing (NLP) technology that generates forms of human-like text from a simple prompt. When broken down, GPT-3 has three core functions: filling in the blanks, translating ideas into text or text-based visuals and statistical correlation through leveraging a massive "encyclopedia" of text. Essentially, GPT-3 helps answer the question: what word tasks would you do if you had an infinite team?
While it does not take us any closer to true artificial intelligence, this model is so large that it can perform the language tasks of multitudes of people very efficiently and effectively. This could automate many tedious word tasks and create countless business efficiencies. The technology is currently in the beta testing phase, allowing access to specific users with novel applications that could further investigate the potential benefits of GPT-3.
Simply stated, users input a text prompt in a use-case script with certain parameters such as the maximum text output (quantified in tokens). The script leverages the OpenAI API to generate a random response for the output the function is calling for. In the example below, the function expands on the text prompt input (which provides an excellent description of how the API works, and we would highly recommend reading).
However, the API is not just limited to text outputs. The same type of prompt can take the output even further and display something visual (based on a generated natural language output, of course). For example, existing API use cases include chart, code and website generation.
Currently the AI R&D program has applied for beta access to the API and is pending approval. Our team seeks to utilize the API to improve our offerings and capabilities and promote our customers' success. GPT-3 has the potential to connect the dots with our capabilities and our customers' needs, enabling a more seamless end-to-end solution for our customers from point-of-sale (POS) and resource management to data and contextual metadata analytics and insights.
While there are countless ways of utilizing the GPT-3 API and NLP technology, there are WWT-specific use cases detailed here that may add significant value for our organization and our customers.
Use case 1
While focusing on helping our customers modernize their IT infrastructure, we would like to eventually develop solutions leveraging GPT-3 to streamline and innovate their IT operations. To start, we would first want to focus internally on our own IT operations as a pilot. WWT has a robust IT infrastructure, including an Advanced Technology Center (ATC), that is used as an IT lab environment for customers and WWT researchers. The ATC is constantly evolving to accommodate the testing of new AI tools, technologies and architecture and to help us maintain our own IT operations as effectively as possible. Incorporating GPT-3 in this space may help us achieve this.
Determining the root cause for IT outages in a global enterprise can be difficult, and many Help Desk tickets are handled disparately and individually. Developing a solution with GPT-3 would allow WWT to listen to and contextualize various IT issues and resolutions throughout the organization while correlating it with all telemetry data coming off the IT infrastructure. This solution could provide various recommendations for proactive resolutions as well as paint a broader picture for more systematic issues that may be occurring, resolving problems more holistically. With our technology labs, we would be able to stand this solution up and pilot it internally first, all while developing solutions catered to our customers' needs that we could commercialize in the future.
Use case 2
WWT works with several quick-service restaurant chains to implement data-driven insights and management tools. There is great potential in this space for a more seamless solution utilizing GPT-3. We envision using this technology from a predictive standpoint of leveraging prior data as well as contextual data surrounding current events in the area to power more accurate insights to answer the questions of "What?" "How much?" and "When?" before they are asked, as well as automate the action items associated with these insights. This may include ordering extra ingredients before a specific event or timeframe, as well as scheduling extra staff and other insight-driven management tasks.
All this information can be provided in plain language to restaurant management through GPT-3, as well as a chatbot that can interact with restaurant employees. Furthermore, this would be a powerful technology to leverage for POS and restaurant management software (RMS). We envision leveraging GPT-3 to automate aspects of the POS, translating a customer's verbal order to the computer and initiating the subsequent action items, such as billing and transaction, food preparation and inventory management.
Utilizing ML in cancer research
As the AI R&D program looks to broaden its innovative reach, we are exploring university partnerships as well. WWT is interested in fostering strong partnerships with universities and research programs to build recruiting bridges and leverage data analytics capabilities for unique and differentiated applications.
The AI R&D program seeks to expand its exposure and knowledgebase in analytics applications and interesting data sets, advancing state-of-the-art technologies with data science​ as well as co-publishing high-profile articles to demonstrate thought leadership when possible. Beginning with cancer research at Northwestern University, the AI R&D program seeks to expand to other universities and research programs in the future.
Initially, our team is piloting a relationship with Northwestern University, combining partial wave spectroscopy (PWS) and ML to help fuel cancer research developments. We have also established a relationship with the Backman Lab at Northwestern University. The Backman Lab focuses on finding variances in patterns of light reflection from cells indicative of a cancer-rich environment through a technology called PWS. With PWS, scientists can measure light reflection, quantifying nanoscale changes in chromatin, and research has shown that cancerous cells may have different reflection behaviors than non-cancerous cells.
Not only do cancerous cells behave differently than healthy cells, but non-cancerous cells from a cancer-rich region often do as well. This is what makes things interesting: by identifying in a non-invasive way whether a patient's non-cancerous cells are indicative of cancer, measures can be taken earlier on to detect and eradicate cancer before it spreads.
At the Backman Lab, researchers are zeroing in on lung cancer. When an individual has lung cancer, nanoscale chromatin changes happen in cells around the cancerous region which can be detected through PWS. In this case, these changes can be found in cells within the mouth and cheek. The idea is that a simple swab of the cheek could produce cells that would be inspected under a PWS microscope and analyzed for light reflection patterns indicative of cancer. However, deciding what patterns and cells are indicative and to what level of confidence is the tricky part, which is where our AI R&D team and ML expertise comes into play.
Our team has set out to utilize ML to better understand these patterns, more effectively identify which patients have a high likelihood of cancer and recommend a more invasive form of screening, such as a low-dose CT. Ultimately, the goal at the Backman Lab is to commercialize PWS for the early detection of lung cancer. Meanwhile, the goal for our AI R&D team is to leverage ML technology for faster and more effective implementation of PWS methodology to detect cancer, such as better capturing cell features critical to PWS and reducing data processing efforts for PWS scalability.
Initially, our team has taken on data from 17 patients for initial discovery (7 cancerous and 10 non-cancerous for control). For each patient, 40 cheek cells are analyzed at 84 wavelengths each, creating an "image cube" of data shown in the figure below.
At a high level, the approach (shown in the next diagram) is to ingest the image cube for a patient, identify and label the features (cell and nucleus) and utilize ML to label individual cells as normal or abnormal. Collectively, the image cube with its normality-labeled cells would be analyzed as a whole and compared to control patients to identify if the patient is at risk of cancer and at what level of confidence, ultimately informing if the patient should be sent on to the next level of screening.
There were several initial challenges with the data, including access, storage and sharing. We began by setting up an AWS environment within our own ATC lab and granted access to users at Northwestern University in order to share data and collaborate on analysis and findings. WWT has begun initial research, labeling the cell-level data using Principal Component Analysis (PCA), visualizing a compressed representation of the image cube of cellular data utilizing an autoencoder and improving cell/nucleus labeling using automated edge detection.
Areas of the research we initially hope to improve the process through ML include capturing cell features critical to PWS analysis more effectively as well as reducing data processing efforts for PWS scalability. Stay tuned to hear more about our analysis and findings as the research continues!
Growth of the AI R&D team: Multicloud and digital teams join the program
Our rotation witnessed significant growth of the AI R&D program as the multicloud and digital teams started participating in the program. There was an increased representation from various multicloud teams to support the development of new MLOps client offering, based on prior R&D project work. The cloud team developed the proof of concept and assisted in driving out the MLOps offering. We also saw increased participation by the digital team as they supported the submission of the GPT-3 application for beta access.
Toward the latter half of the rotation, we saw increased participation and knowledge sharing from additional WWT groups like IT and Lab Services. They had the opportunity to present regarding using AI for analyzing survey data in the weekly meeting on 11/19.
What's next?
While we have had great progress in furthering our innovative capabilities and reach during our rotation, the job is far from done. Moving forward, the next rotation on the R&D Team will have their hands full continuing the progress on these projects as well as others to come.
In addition to the creation of the Tiger Teams, the AI R&D program aims to find ways to deploy models using MLOps pipelines. Thereafter, our focus will be to present and explore opportunities with clients who are interested in MLOps.
Our team is still seeking access to the OpenAI API for GPT-3, and until we achieve that, the full benefits of the technology for our business will not be fully understood. As the AI R&D Team continues to grow, we are hoping to make progress on this front to better understand the technology's applications and implications to the business, and hopefully begin to pilot solutions for both research and our customers.
Lastly, our partnership with the Backman Lab at Northwestern has only just begun. There is still a long and exciting road ahead in utilizing ML in cancer research and detection. We see a strong future for our partnership and great potential for this technology to impact countless lives. Beyond Northwestern University, we hope to begin scaling our partnership program to other universities and partnerships as well.