November 4th, 2024

Announcing GitHub Copilot in Data Wrangler

Jeffrey Mew
Senior Product Manager

AI did not write this blog post, but it will make your exploratory data analysis with Data Wrangler better!

Today, we’re excited to introduce our first step of integrating the power of Copilot into Data Wrangler.

With this first integration of Copilot with Data Wrangler, you’ll be able to:

  • Use natural language to clean and transform your data
  • Get help with fixing errors in your data transformation code

 

Using Copilot to generate code for a data transformation in Data Wrangler

An example of using Copilot in Data Wrangler to filter for listings that allow dogs/cats

 

A common limitation of using AI tools for exploratory data analysis tasks today is the lack of data context provided to the AI. Responses are typically more generalized and not tailored to the specific task or data at hand. In addition, there’s always the manual and tedious task of verifying the correctness of the generated code.

What makes Copilot with Data Wrangler different is twofold. First, this integration allows you to choose to provide Copilot with your data context, enabling it to generate more relevant and specific code for the exact dataset you have open. Second, you get to preview the exact behavior of the code on your dataset with the Data Wrangler interface to visually validate Copilot’s response, along with all the benefits that the Data Wrangler tool provides.

Data transformations

With Copilot in Data Wrangler, you can ask it to perform ambiguous, open-ended transformations or a specific task you have in mind. Below we’ve included three examples of the many possibilities you can achieve with Copilot in Data Wrangler:

Formatting a datetime column in Data Wrangler with Copilot

Formatting a datetime column


Using Copilot in Data Wrangler to remove any column with over 40% missing values

Removing any column(s) with over 40% missing values


Using Copilot in Data Wrangler to fix errors in code

Fixing an error in a data transformation

Getting started today

To use Copilot with Data Wrangler, you will need the following 3 prerequisites.

  1. You must have the Data Wrangler extension for VS Code installed.
  2. You must have the GitHub Copilot extension for VS Code installed.
  3. You must have an active subscription for GitHub Copilot in your personal account, or you need to be assigned a seat by your organization. Sign up for a GitHub Copilot free trial in your personal account.

 

Follow these steps to Set up GitHub Copilot in VS Code.

Once the prerequisites are met, you will see the Copilot interface within Data Wrangler by default (customizable in the Data Wrangler settings) when you are in Editing Mode. You can then either select the input box or use the default Copilot keyboard shortcut of CMD/CTRL + I.

Responsible AI

AI is not perfect (neither are we!) and it will improve over time. Microsoft and GitHub Copilot follow Responsible AI principles and employ controls to ensure that your experience with the service is appropriate, pleasant, and useful. We understand there is hesitation and concern surrounding the rapid expansion of AI’s capabilities, and fully respect those who don’t want or can’t use Copilot.

If you have any feedback around the Copilot experience in Data Wrangler, please file an issue in our Data Wrangler public GitHub repository here.

Next Steps

We are just getting started. This is the first experience in Data Wrangler that we are enhancing with Copilot. Stay tuned for more AI-powered experiences in Data Wrangler to help with your data analysis needs soon!

 

Author

Jeffrey Mew
Senior Product Manager

PM working on all things data science for VS Code (#PyTorch #TensorFlow)

0 comments