The ASTRONOMY-AWS coupling.

Satyansh Srivastava
8 min readOct 17, 2020

Have you ever given it a thought about how your Google Drive works? The pristine technology saves the data over the cloud within minutes and provides us with the ability to operate over the data anytime and from anywhere. The answer to these questions can be answered within two words — Cloud Computing. Now let's get an overview of what Cloud Computing actually is, followed by Cloud Computing in the field of Astronomy.

Cloud Computing

Cloud computing is the technology that has driven the business for numerous companies and firms in the past decade and emerging as a backbone for most of the present working tech giants. It provides IT resources like Storage, Computing, Power, Databases, etc. over a pay-as-you-go basis via the web. It cuts down the cost of buying, owning, and maintaining physical hardware and workspaces and a comparatively faster and cheaper substitute for firms to work alongside. The AWS (Amazon Web Services) has been a leader in Cloud Computing, providing solutions to almost every industry from Education to Telecommunication, from Manufacturing to Energy; simultaneously offering ample opportunities for every use case existing in the field of technology, from DevOps to Scientific Computing, paired with agility, durability, and availability.

The ASTRONOMICAL constraints

Many stellar experiments are ongoing in Astronomy, which requires the processing of tons of data every day, at an unprecedented speed. Had not the technology of Cloud computing been around, it would’ve been a herculean task to tackle such enormous data that might even extend up to 1GB per minute.

THE INAF-AWS CONFIGURATION

Experiments like E-ELT (ESO Extremely Large Telescope) and CTA (Chrerenkov Telescope Array) at the National Institute for Astrophysics (Istituto Nazionale di Astrofisica or INAF) are being successfully performed because of the Cloud Computing power of AWS.

Figure 1 — AWS Architecture for ESO-HiReS simulation. Input coming from the spectrograph design is uploaded to Amazon S3. Then, AWS Lambda initiates EC2 g2x.large instances to perform a CUDA simulation, and then the results are stored back on S3.
Figure 2 — AWS architecture for CTA simulations. As in the case of HIRES, the architecture provides triggers from S3 as soon as the input for simulations is uploaded. An Amazon SQS FIFO queue is used to dispatch simulations between EC2 instances. Then, the processed data is sent back to S3. They make use of Docker to containerize the software and Amazon Glacier for long-term storage.

The former project involves the ultra-high-resolution spectrograph HiRes for detecting the life signatures outside our Solar System and giving us the ability to detect the complex life forms and complete a census of the composition of Earth-like planets that orbit their host star inside its habitable zone to sustain life. The HiRes telescope sends terabytes that require heavy stimulations to obtain insightful results, as shown in figure 1.

The latter project involves observation of extragalactic sources that irradiate photons in the band of Gamma Rays allowing for the study of high-energy physics. This stimulation of the CTA data required around 300,000CPU/hours to process over the cloud.

For both of these projects, INAF used the Amazon Elastic Compute Cloud for calculations, Amazon Simple Storage Service for the storage of the received and processed data, and AWS Lambda and Amazon SQS for managing the flow and tasks of the EC2 instances paired with the Amazon S3 Glacier for storing the data cost-effectively as shown the figures 1 and 2.

THE NASA-AWS METASTATE

The NASA Image and Video Library evolution

The ever-expanding universe( or at least until we hit the Big Freeze or the Big Rip) has a lot to offer and explore. From the silent starry nights to the rigorous blazing days, the universe gets even more beautiful with every passing second. The marvels of this universe can unwind and be contemplated upon at the NASA image and Video Library, which provides easy access to more than 140,000 still images, audio recordings, and videos — documenting NASA’s more than half a century of achievements in exploring the vast unknown. These resources weren’t always available at our fingertips. NASA made its photos and videos open to the public starting from as early as 2000’s wherein if we needed a video for a Space Shuttle launch, it was available at the Kennedy Space Centre Website. If we wanted pictures from the Hubble Space Telescope, we needed to go to the Goddard Space Flight Center website. With this data, distributed all over different websites, it was challenging for the customers to find whatever they needed and required substantial digging to obtain the desired data.

Some efforts were made to organize this data, which was primarily spread over different locations in one place. “In large part, those initial efforts were unsuccessful because each center categorized its imagery in different ways,” says Rodney Grubbs, Imagery Experts Program Manager at NASA. “As a result, we often had five to six copies of the same image, each described in different ways, which made searches difficult and delivered a poor user experience.”

In 2011, NASA thought that the best approach to tackle this problem of keeping all the data at one place for easy access was to redesign the mechanism from scratch. With numerous hit and trial, NASA came up to integrate their systems with the cloud. “We wanted to build our new solution in the cloud for two reasons,” says Grubbs. “By 2014, like with many government agencies, NASA was trying to get away from buying hardware and building data centers, which are expensive to build and manage. The cloud also provided the ability to scale with ease, as needed, paying for only the capacity we use instead of having to make a large up-front investment.”

This approach was a sheer success. the AWS helps it run as an immutable infrastructure with a fully automated environment. Now the AWS-Metastate (as what I call it) consisted of: The amazon Elastic Compute Cloud(EC2), which helped NASA to scale up and down their computing depending upon how much they needed, the Elastic Load Balancer(ELB), which balanced the amount of traffic visiting at any particular instance, Simple Storage Service (S3) which supports object storage for incoming (uploaded) media, metadata, and published assets, and many more services like Amason RDS, SQS, etc. All of this made The NASA Image and Video Library a state of the art approach; a handy manifestation of all that the universe beholds, in the form of images, videos, etc.

The architecture of the NASA-AWS metastate representing how NASA holds the data together.

The NASA-AWS and CMEs conflict

NASA has brewed an exemplary method of tackling the solar storms and Coronal Mass Ejection using the AWS Machine learning solutions — to foster unsupervised learning and anomaly detection — with the help of Amazon Web Services. But before digging into the depths of how the solution created with AWS help works, let us understand the CMEs.

STEREO’s View of July 23, 2012, CME. Credit: NASA/STEREO

Coronal Mass Ejections (CMEs) are massive expulsions of plasma and magnetic fields from the Sun’s corona. They can eject billions of tons of coronal material and carry an embedded magnetic field (frozen in flux) more energetic than the background solar wind interplanetary magnetic field (IMF) strength.

On March 13th, 1989, a severe geomagnetic storm struck, leading to numerous anomalies. The storm began on Earth with extremely intense auroras at the poles. This geomagnetic storm caused a nine-hour outage of Hydro-Québec’s electricity transmission system. This storm led over 6 million people to be left without power for nine hours. Simultaneously, in the United States, 200 instances of power grid malfunctions were reported. More worryingly, the step-up transformer at the New Jersey Salem Nuclear Power Plant failed and was put out of commission.

An illustration depicting the interaction of the solar Flare and the Earth’s Magnetic Field.

The question would arise about why we can't predict the storm and develop countermeasures to tackle such anomalies. The answer to that is the lack of Historical Data to train our models over. This is a sporadic cosmological event, occurring about once every 50 years. Consequently, the general supervised learning algorithm proves significantly ineffectual in tackling the scenario and providing necessary counter steps. Moreover, the data collected by numerous satellites is too gigantic to be correlated using conventional methods. Therefore, NASA has teamed up with AWS and integrated their Machine Learning algorithms to work on giant data sets within a snap, and that too exceptionally conveniently. NASA’s anomaly detection relies on simultaneous observations of solar wind drivers and responses in the magnetic fields around Earth. These geological superstorms can be modeled as anomalous outlier events among the general solar storms.

Regions affected by blackouts from the July 14, 2017, solar flare.

These superstorms can be detected by monitoring the particle-density in the earth’s ionosphere. Most of the ionosphere is electrically neutral. However, solar radiation activity can cause electrons to be dislodged from atoms and molecules. Consequently, the particle-density might increase manifold on the sun-facing side of the planet in the case of a superstorm. On the darker side of the earth, the increased ions create a hole in the ionosphere due to the lack of sunlight. NASA utilizes Amazon SageMaker to train an anomaly detection model using the built-in AWS Random Cut Forest Algorithm (RCF) with heliophysics datasets compiled from the numerous ground- and satellite-based instruments. RFC associates an anomaly score to each data point. Higher scores correspond to the existence of some anomaly inside the dataset and are considered normal otherwise.

The AWS architecture that NASA uses to examine and process the Satellites' data and derive an insightful approach. Credits: https://www.amazon.science

With all this Data and anomalies to examine, the researchers can develop a greater idea of what causes them and their interconnections. NASA and AWS are staging a centralized data lake to allow researchers to access and analyze cosmological data with dynamic cloud computing resources.

Monitoring these anomalies and creating numerous working models of these storms or extrapolating them into superstorms can help us get a deeper insight into these phenomena and derive more innovative methods to work on the data.

“Research in heliophysics involves working with many instruments, often in different space or ground-based observatories. There’s a lot of data, and factors like time lags add to the complexity. With Amazon, we can take every single piece of data that we have on superstorms and use anomalies we have detected to improve the models that predict and classify superstorms effectively.” says Janet Kozyra, a heliophysicist who leads this project from NASA headquarters in Washington, D.C.

There are many more use cases wherein Astronomy uses AWS as an understructure to overcome its restrictions in terms of computational powers or resource management.

--

--

Satyansh Srivastava

Technology Enthusiast || Aspiring Engineer || Astronomy aficionado