The Controlled Environment Agriculture Open Data project aims to advance controlled environment research, machine learning and artificial intelligence through the collection and dissemination of crop production data.
There is a considerable amount of data being generated by both private companies and university researchers when it comes to controlled environment crop production. This data is being generated for ornamentals, food crops, and cannabis. One of the questions about all this data is whether it is being used to its maximum potential to benefit the horticulture industry.
“Data has become a big topic in the horticulture industry with university researchers and private companies,” said Erico Mattos, executive director of the Greenhouse Lighting and Systems Engineering (GLASE) consortium. “People can identify with the challenges and opportunities with the amount of data that is being generated. However, we don’t yet have a centralized repository and a standard methodology for storage to allow us to explore and exploit this data.”
Addressing the data proliferation
In 2018 during the North Central Extension & Research Activity–101 (NCERA-101) meeting, members of this USDA-organized committee discussed what should be done with the extensive amount of data being generated by controlled environment researchers. Ohio State University professor Chieri Kubota proposed the formation of a sub-committee to address the need to develop guidelines for sharing data generated by controlled environment agriculture researchers.
“Dr. Kubota initiated the discussion about the need for a centralized platform to store data collected from controlled environment research,” Mattos said. “A task force was formed that included Chieri, Kale Harbick at USDA-ARS, Purdue University professor Yang Yang, Melanie Yelton at Plenty and myself. Since the task force was formed, Ken Tran at Koidra and Timothy Shelford at Cornell University have also become members of the task force.
“We started discussing how we could make use of all this data. Researchers in the United States collect a huge amount of data. All of the environmental data such as temperature, relative humidity, and carbon dioxide and light levels in controlled environment research is collected. There is also a biological set of data which includes plant biomass and fruit yield.”
Mattos said there is also a great deal of research data generated and collected by private companies that are not shared with the horticulture industry.
“With the advancement in sensors and environmental controls, the capability now exists that this data can be collected,” he said. “With the advancements in computing power, this data can be used to start new applications and new tools that haven’t been available before. However, in order to do this, we have to have access to a large amount of data. That’s why the task force thought it would be good to create a repository where researchers and private companies could share the data following a specific format. This data could then be used in the advancement of machine learning and artificial intelligence applications to optimize crop yields in commercial CEA operations.”
Need for collecting and organizing data
Mattos said university researchers see the value in creating a centralized database.
“There are probably millions of big data points when you consider how many researchers are doing research in the U.S.,” he said. “Historically, these researchers have not been required to share their data. However, an increasing number of funding agencies and organizations, including USDA, are requiring that researchers share their data. If researchers apply for a grant from USDA, they are required to include information about their data management plans in their grant proposals.
“Researchers see the value of sharing this data, but this is not a common practice which involves allocating time and resources. This means someone on their research team would have to organize and share the data.
Creating a central database
Based on the need for collecting and organizing the controlled environment research data that is being generated, the task force established the Controlled Environment Agriculture Open Data (CEAOD) project. The project aims to promote data sharing to accelerate CEA research.
The CEAOD website provides guidelines on how to upload the data. The task force developed the guidelines, which include three sets of data that can be uploaded to the website.
“One set is environmental data, including environmental controlled parameters such as temperature, carbon dioxide, relative humidity, and ventilation,” he said. “These data points are usually collected automatically by sensors. Another set of data is biological data, which is usually collected by humans. These biomass production yield parameters include shoot and root biomass and plant height and weight. The final document is the metadata, which are descriptions of the experimental setups and data sets. It is a file that explains the experiments. It describes how the experiments were done.
“There is a certain format that is recommended to be followed to upload the data on the CEAOD website. The step-by-step process is listed on the website. There are no restrictions on which crops the data can be submitted. Our goal is to establish a platform to host a large number of crop production data sets to allow for the development of machine learning and artificial intelligence algorithms aimed at improving crop production efficiency.”
Leading by example
This winter, GLASE will have a student collecting and organizing environmental and biological research data.
“The data will be uploaded to the CEAOD database, and we will be documenting these activities,” Mattos said. “We will create a guideline of recommendations. We also plan to work with researchers from other institutions to demonstrate how the data can be organized and uploaded to create awareness and how to use the database.
“We hope this initial GLASE contribution will incentivize other researchers to share their data and will facilitate the uploading process. Access to the CEAOD database is free. It is an open platform, and anyone can contribute to the development of this database tool.”
Benefits to the horticulture industry
Mattos said private companies would also benefit from the collection of data and creating a centralized database.
“These companies need more data because it would allow them to analyze the data to develop new products and identify new markets,” he said. “Unfortunately, many of these companies don’t want to share their data. They are very proprietary about their data. They see that collecting and analyzing this data can put them ahead of their competition.
“Many private companies see the need for more data and how it can be valuable but are unwilling to share their own data. But like in other industries, there are early adopters. I believe there will be companies that step up and will share their data with the horticulture industry. Hopefully, industry people will be willing to contribute and work on this database as well.”
Mattos said one of the big applications with this project is related to machine learning and artificial intelligence.
“With these applications, large sets of data are needed in order to create baselines,” he said. “Using the data, machines can be taught. Currently, growers’ production knowledge and opinion are more accurate for growing crops than artificial intelligence predictions. Growers are still more reliable, but it is just a matter of time before the use of big data, and artificial intelligence will be able to match the growers in regards to optimizing growth.
“We are trying to develop this platform between the growers and controlled environment researchers and the machine learning/data computer scientists. I’m not sure the controlled environment researchers have grasped the potential that is available. We are not using this technology. Establishing this platform, as we collect and disseminate the data, there is real potential to help the advancement of the horticulture industry.”
For more: Erico Mattos, Greenhouse Lighting and Systems Engineering (GLASE), (302) 290-1560; firstname.lastname@example.org.
More info on CEAOD
Want to learn more about the Controlled Environment Agriculture Open Data project? Then check out these two upcoming events.
Aug. 4, 2-3 p.m. EDT
GLASE webinar: Controlled Environment Agriculture Open Data project. Presented by Erico Mattos, executive director of GLASE, and Kenneth Tran, founder of Koidra LLC.
Aug. 13, 10:30 a.m.-12 p.m. EDT
American Society for Horticultural Science presentation: The Promise of Big Data and New Technologies in Controlled Environment Agriculture. Presented by Erico Mattos.
David Kuack is a freelance technical writer in Fort Worth, Texas; email@example.com.