The U.S. federal government is actively seeking a machine capable of generating synthetic data to enhance machine learning models, particularly in the context of cybersecurity. Synthetic data, or artificially generated data, becomes crucial when real-world data is either unavailable or poses privacy and security risks. The Department of Homeland Security (DHS) is spearheading this initiative, with the Science and Technology Directorate offering contracts worth up to $1.7 million over three years. This move aims to address the challenges of utilizing sensitive real-world data while ensuring privacy safeguards.
The solicitation by DHS emphasizes the difficulty of using the vast amount of sensitive data generated by the department across organizational boundaries. The Silicon Valley Innovation Program, responsible for awarding contracts, operates outside traditional government contracting procedures, allowing for a more agile approach. The desired solutions should have the capability to “generate synthetic data that models and replicates the shape and patterns of real data while safeguarding privacy.” This initiative aligns with a broader effort, as the U.S. federal Chief Data Officers Council is also seeking insights on synthetic data to develop best practices.
Amidst a data-rich environment, the demand for synthetic data is growing, with Gartner predicting it to constitute 60% of data consumed by AI this year. Synthetic data serves various purposes, such as enhancing the accuracy of machine learning models and testing systems where live data is unavailable. The solicitation highlights the need for solutions that support both structured and unstructured data types, address bias concerns, and prevent the reverse-engineering of real data from synthetic sources. The DHS envisions leveraging synthetic data to simulate cyber-physical system attacks and amplify synthetic data elements to detect real-world threats. Interested companies can submit responses until April 10.