Search
  • Videos
  • Executive Guides
  • Security
  • Cloud
  • Innovation
  • CXO
  • Hardware
  • more
    • Microsoft
    • Hardware
    • Apple
    • See All Topics
    • White Papers
    • Downloads
    • Reviews
    • Galleries
    • Videos
    • TechRepublic Forums
  • Newsletters
  • All Writers
    • Preferences
    • Community
    • Newsletters
    • Log Out
  • Menu
    • Videos
    • Executive Guides
    • Security
    • Cloud
    • Innovation
    • CXO
    • Hardware
    • Microsoft
    • Hardware
    • Apple
    • See All Topics
    • White Papers
    • Downloads
    • Reviews
    • Galleries
    • Videos
    • TechRepublic Forums
      • Preferences
      • Community
      • Newsletters
      • Log Out
  • as
    • Asia
    • Australia
    • Europe
    • India
    • United Kingdom
    • United States
    • ZDNet around the globe:
    • ZDNet France
    • ZDNet Germany
    • ZDNet Korea
    • ZDNet Japan

Azure Synapse Analytics data lake features: up close

1 of 19 NEXT PREV
  • Home screen

    Home screen

    Though still in public preview, Synapse Analytics has added a slew of new data lake features features based on Apache Spark, to the platform.

    But it's much more than that. With Synapse Studio, Synapse Analytics' browser-based development environment, a slew of capabilities come together. With the help of this tool, Synapse combines not only data warehouse and data lake; but also data engineering and data science; BI and AI; cluster computing and server-less computing; T-SQL and Spark SQL; notebooks and scripts; Python, Scala and C#.

    I created this gallery for two purposes: as a show-and-tell for readers to understand the public preview features in the service, but also to structure my own learning and understanding of them.

    All of the code and work here is based on examples from Microsoft, but the hands-on work and screenshots are original.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Home screen new object menu

    Home screen new object menu

    From the home screen, you can directly create any of the major assets Synapse Studio lets you author. These include SQL scripts (against the warehouse or the lake), Jupyter notebooks (in a customized Synapse Studio experience), Spark batch job definitions, Azure Data Factory pipelines and Mapping Dataflows, and Power BI reports. You can also import (read: upload) existing scripts and notebooks.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Left navbar

    Left navbar

    The left navbar in Synapse Studio provides a good overview of the tool's capabilities, as well as those of the platform overall. With buttons for looking at data; doing development work; creating pipeline orchestrations; monitoring ongoing processes and managing assets (like server pools, linked services and orchestration triggers), Synapse brings together an enormous number of capabilities. And there's integration with Power BI and Azure Machine Learning, too.

    Click through all the slides in this gallery for details.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Monitoring your SQL pools

    Monitoring your SQL pools

    From the Synapse Studio "Manage" screen, it's easy to see a list of all your Synapse SQL pools, including the server-less one for SQL on-demand. You can pause, create and delete pools here too, of course.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Databases view provides unified view of warehouse and lake data

    Databases view provides unified view of warehouse and lake data

    Clicking the Data button on the left navigation bar displays all your data in Synapse, whether it be in a data warehouse (SQL pool) or the data lake (Spark). You can also create and open a starter notebook that queries one of your data lake tables by right-clicking it, then choosing "New notebook"  and "Load to DataFrame", as shown here.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • SQL on-demand

    SQL on-demand

    Synapse's SQL on-demand lets you query data in your data lake (be it in Spark databases or just sitting as a file in Azure storage) using the same Transact SQL (T-SQL) language used for SQL pools, Azure SQL Database and SQL Server.

    The SQL script editor lets you run queries and view the result sets in tabular format or using the same native data visualizations available in Synapse notebooks. A query against data from the Twitter API, and a visualization of same, are shown here

    Obviously, querying the data warehouse with T-SQL works too. Just pick your SQL pool (instead of "SQL on-demand") from the "Connect to" drop-down list.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Notebook

    Notebook

    Another way to query your data is with code in notebooks. Synapse notebooks let you mix markdown text, code (in several languages) and data visualizations in one user interface. Python code and visualization output from the popular matplotlib library are shown here.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Notebook Options

    Notebook Options

    Synapse Analytics notebooks provide a customized user interface skin over standard Jupyter notebooks. They can use Python, Scala, Spark SQL or even .NET/C# language kernels (C# is highlighted, although the code shown is in Python). Regardless of the default language for a notebook, any given cell within it can contain code in any one of the supported languages.

    Notebooks can connect to any Synapse Spark pool, by choosing it from the "Attach to" drop-down list highlighted in this figure. Note the highlighted button at the top-right will create a new pipeline and add the notebook to it.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • .NET Dataframes and UDFs

    .NET Dataframes and UDFs

    You can query the data lake in C# (highlighted in the Language dropdown list) using the .NET DataFrame library. You can even create user defined functions in C# and reference them when the DataFrame is created, as illustrated by the highlighted source code in this slide.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Data Factory pipelines

    Data Factory pipelines

    You can create Azure Data Factory (ADF) pipelines in Synapse Analytics too. Using Synapse Studio integration, it's easy to create a simple pipeline that will run the code in a notebook on a scheduled basis. To do so, just drag and drop the notebook from the object explorer onto the pipeline design canvas, as shown here, and then create a trigger by clicking the highlighted "Trigger" toolbar button. Click the "Publish All" toolbar button,  highlighted at the top-left of the screen, to save and register the pipeline and trigger.

    While not shown here, you can create ADF mapping data flows, and use ADF's Copy Data tool, too.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • New linked service

    New linked service

    You can connect to data in a myriad other platforms (including competing ones, like Amazon Redshift, selected here) by creating a Synapse Analytics "linked service." If you link to a Power BI workspace, you can create Power BI reports against data in Synapse, and even edit the report in Synapse Studio. (Report creation and editing are shown in the next two slides.)

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Create a Power BI dataset from Synapse Studio

    Create a Power BI dataset from Synapse Studio

    You can create a dataset in the Power BI cloud service, pointing to your data warehouse, directly in Synapse Studio. Simply hover over your SQL pool in the list and download a special .pbids file, which contains all the connection information you'll need. Next, double-click the file to create a new report against your warehouse in Power BI Desktop. Once you publish the report, the dataset will be created, and you'll be able to edit the report further right in Synapse Studio.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Editing a Power BI Report

    Editing a Power BI Report

    Here's the Power BI editing experience. Put simply, the Power BI Web interface is hosted within Synapse Studio. But Power BI also gets its own node in Synapse Studio's Develop screen, along with scripts, notebooks and Spark job definitions.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Azure ML integration, via code

    Azure ML integration, via code

    Beyond Power BI, Synapse Analytics also integrates with Azure Machine Learning (Azure ML). For now though, that integration is only through code. Not that there's anything wrong with that...Synapse's affinity (and Spark's) for Python code, and the availability of the Azure ML SDK for Python means it all fits together. 

    Here, you can see code in a Synapse Analytics notebook that uses the Azure ML SDK to perform an AutoML experiment. Notice the console output from Azure ML streams back into the notebook cell, shown in the bottom half of this slide. The Synapse Spark cluster itself is used for the compute.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Create a Spark batch job

    Create a Spark batch job

    In addition to notebooks and SQL on-demand scripts, you can run a job on your Synapse Spark cluster based on a Python script, a Scala jar file or a zipped up .NET Core package. You do this by creating a new Spark job definition in the Synapse Studio's Develop screen. Here we are creating a job to run the classic wordcount algorithm, using a Python script, that processes a file with text from Shakespeare.

    The highlighted Submit button actually runs the job and, after it has been successfully submitted, a hyperlink (also highlighted) appears that, when clicked on, lets you monitor the job's progress in Synapse Studio's Monitor screen. You can see this in the next slide.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Spark job monitoring

    Spark job monitoring

    Use Synapse Studio's Monitor screen to watch (or even "replay") the status of a Spark job. Note the highlighted "Spark history server" toolbar button that allows you to monitor the job in the Spark UI instead. Advance to the next slide in this gallery to see that.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Monitoring a job via the Spark UI

    Monitoring a job via the Spark UI

    After clicking the Spark UI button from Synapse Studio's Monitor screen, the standard Spark UI appears. You can drill down on the tasks within your Spark job, open a specific task and further drill down on its event timeline, as we've done here.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Inspecting Spark job output

    Inspecting Spark job output

    It's easy to see the results of your Spark job by viewing the contents of its Azure Data Lake Storage (ADLS) output folder.

    For our wordcount job, we can use the ADLS blade in the Azure portal to view the output folder, the path to which is highlighted. Also highlighted are the zero-byte "_SUCCESS" file, and two output sequence files. Partial output from the first sequence file is displayed and highlighted in the middle of the screen.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

  • Now create your own workspace

    Now create your own workspace

    Ready to try this on your own? The new Synapse Analytics workspaces are in public preview and ready for you to provision on your own. Just head over to the Azure portal, create a new resource, search for "Synapse" and click on "Azure Synapse Analytics (workspaces preview)" to bring up the create screen shown here. Then go to Microsoft's "Get Started with Azure Synapse Analytics" page to get going.

    Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

    Caption by: Andrew Brust

1 of 19 NEXT PREV
Andrew Brust

By Andrew Brust for Big on Data | September 9, 2020 -- 13:00 GMT (21:00 SGT) | Topic: Big Data Analytics

  • Home screen
  • Home screen new object menu
  • Left navbar
  • Monitoring your SQL pools
  • Databases view provides unified view of warehouse and lake data
  • SQL on-demand
  • Notebook
  • Notebook Options
  • .NET Dataframes and UDFs
  • Data Factory pipelines
  • New linked service
  • Create a Power BI dataset from Synapse Studio
  • Editing a Power BI Report
  • Azure ML integration, via code
  • Create a Spark batch job
  • Spark job monitoring
  • Monitoring a job via the Spark UI
  • Inspecting Spark job output
  • Now create your own workspace

Microsoft has added a slew of new data lake features to Synapse Analytics, features based on Apache Spark. It also integrates Azure Data Factory, Power BI and Azure Machine Learning. These features are still in public preview, but that's good enough for us to take a visual tour of what's new.

Read More Read Less

Home screen

Though still in public preview, Synapse Analytics has added a slew of new data lake features features based on Apache Spark, to the platform.

But it's much more than that. With Synapse Studio, Synapse Analytics' browser-based development environment, a slew of capabilities come together. With the help of this tool, Synapse combines not only data warehouse and data lake; but also data engineering and data science; BI and AI; cluster computing and server-less computing; T-SQL and Spark SQL; notebooks and scripts; Python, Scala and C#.

I created this gallery for two purposes: as a show-and-tell for readers to understand the public preview features in the service, but also to structure my own learning and understanding of them.

All of the code and work here is based on examples from Microsoft, but the hands-on work and screenshots are original.

Published: September 9, 2020 -- 13:00 GMT (21:00 SGT)

Caption by: Andrew Brust

1 of 19 NEXT PREV

Related Topics:

Big Data Analytics Cloud Digital Transformation Robotics Internet of Things Innovation
Andrew Brust

By Andrew Brust for Big on Data | September 9, 2020 -- 13:00 GMT (21:00 SGT) | Topic: Big Data Analytics

Show Comments
LOG IN TO COMMENT
  • My Profile
  • Log Out
| Community Guidelines

Join Discussion

Add Your Comment
Add Your Comment

Related Galleries

  • 1 of 2
  • Pitfalls to Avoid when Interpreting Machine Learning Models

    Modern requirements for machine learning models include both high predictive performance and model interpretability. A team of experts in explainable AI highlights pitfalls ...

  • When chatbots are a very bad idea

    Not every business problem can be solved by using chatbots. Here are some inappropriate uses for the AI tool.

  • How ubiquitous AI will permeate everything we do without our knowledge.

    Most of us do not know that we are using chatbots to talk to service agents, so how will we know that AI will be seamlessly interacting in with our future lives? ...

  • Streaming becomes mainstream

    The endless streams of data generated by applications lends its name to this paradigm, but also brings some hard to deal with requirements to the table: How do you deal with querying ...

  • Photos: How FC Barcelona uses football player data to win games

    FC Barcelona is focusing on data analysis to give it an edge on the soccer field and at the bank.

  • Heart and sleep apps that work with the Apple Watch

    If you want to track sleep and heart health, these apps will get you going.

ZDNet
Connect with us

© 2020 CBS Interactive. All rights reserved. Privacy Policy | Cookies | Ad Choice | Advertise | Terms of Use | Mobile User Agreement

  • Topics
  • Galleries
  • Videos
  • Sponsored Narratives
  • CA Privacy/Info We Collect
  • CA Do Not Sell My Info
  • About ZDNet
  • Meet The Team
  • All Authors
  • RSS Feeds
  • Site Map
  • Reprint Policy
  • Manage | Log Out
  • Join | Log In
  • Membership
  • Newsletters
  • Site Assistance
  • ZDNet Academy
  • TechRepublic Forums