Create Dataframe With Column Names Spark Scala

Changing Column position in spark dataframe. The following are top voted examples for showing how to use org. To create a constant column in a Spark dataframe, you can make use of the withColumn() method. No data, just these column names. SparkSession import org. Conceptually, they are equivalent to a table in a relational database or a DataFrame in R or Python. Other Data Sources In the Connector drop-down, select a data source type. col( "columnName. Use the following commands to create a DataFrame (df) and read a JSON document named employee. I have a dataframe read from a CSV file in Scala. On creating the Spark DataFrame against both non-orc table ( source ) and the orc table, we are unable to list out the column names in the ORC table : scala> val df. _ statement can only be run. The dictionary keys are by default taken as column names. %md # Code recipe: how to process large numbers of columns in a Spark dataframe with Pandas Here is a dataframe that contains a large number of columns (up to tens of thousands). Column // The target type triggers the implicit conversion to Column scala> val idCol: Column = $ "id" idCol: org. It also shares some common characteristics with RDD: Immutable in nature: We can create DataFrame / RDD once but can't change it. First, let's create few records into data objects using the Seq class and then create the DataFrame using data. 0 release, there are 3 types of data abstractions which Spark officially provides now to use : RDD,DataFrame and DataSet. You can also provide row names to the dataframe using row. Needlessly to say they are amazing. become the names of the columns' name. A dataframe is a distributed collection of data that is organized into rows, where each row consists of a set of columns, and each column has a name and an associated type. 0) or createGlobalTempView on our spark Dataframe. For a new user, it might be confusing to understand relevance of each o. To select a column from the data frame, use apply method in Scala and col in Java. While working in Apache Spark with Scala, we often need to convert RDD to DataFrame and Dataset as these provide more advantages over RDD. In the File Type field, optionally override the inferred file type. We can do in the below way: Say you have a dataframe named DF We can use below syntax: DF. The Datasets API provides the benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL's optimized execution engine. On creating the Spark DataFrame against both non-orc table ( source ) and the orc table, we are unable to list out the column names in the ORC table : scala> val df. as of now I come up with following code which only replaces a single column name. I want to create an empty dataframe with these column names: (Fruit, Cost, Quantity). This time, we are going to use Spark Structured Streaming (the counterpart of Spark Streaming that provides a Dataframe API). I have to handle the scenario in which I require handling the column names dynamically. Is there a simple way to select columns from a dataframe with a sequence of string? Something like. The case class defines the schema of the table. DataFrame: In Spark, a DataFrame is a distributed collection of data organized into named columns. We shall use functions. We can create a DataFrame programmatically using the following three steps. Create Spark DataFrame From List[Any]. _ statement can only be run. Conceptually, they are equivalent to a table in a relational database or a DataFrame in R or Python. To the udf "addColumnUDF" we pass 2 columns of the DataFrame "inputDataFrame". enabled configuration property turned on ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS SQL command generates column (equi-height) histograms. We then use select() to select the new column, collect() to collect it into an Array[Row], and getString() to access the data inside each Row. This is a getting started with Spark SQL tutorial and assumes minimal knowledge of Spark and Scala. IntegerType)) With same column name, the column will be replaced with new one, you don't need to add and delete. You can vote up the examples you like and your votes will be used in our system to product more good examples. 11 to use and retain the type information from the table definition. // IMPORT DEPENDENCIES import org. Example to Convert Dataframe to Matrix in R. We can do in the below way: Say you have a dataframe named DF We can use below syntax: DF. Apache Spark and Scala Certification. frame and Spark DataFrame. Note that you need to import org. by Rahul Mukherjee Last Updated October 26, 2018 12:26 PM Create a List with column name and values. It is designed to ease developing Spark applications for processing large amount of structured tabular data on Spark infrastructure. This topic demonstrates a number of common Spark DataFrame functions using Scala. Extracts a value or values from a complex type. Notice that an existing Hive deployment is not necessary to use this feature. A Spark DataFrame is a distributed collection of data organized into named columns that provides operations. Add file name as Spark DataFrame column. cacheTable("tableName") or dataFrame. In Apache Spark, a DataFrame is a distributed collection of rows under named columns. You can copy paste the code line by line in Jupyter Notebook with Scala-Toree Kernel or to your favorite IDE with Scala and Spark dependencies or even use Spark's Scala. val colNames = Seq("c1", "c2") df. import org. Scala Spark DataFrame : dataFrame. Spark DataFrames API is a distributed collection of data organized into named columns and was created to support modern big data and data science applications. map) and does not eagerly project away any columns that are not present in the specified class. Creating Pandas Dataframe can be achieved in multiple ways. Spark supports columns that contain arrays of values. Introduction to DataFrames - Scala. Spark dataframe split one column into multiple columns using split function April 23, 2018 adarsh 4d Comments Lets say we have dataset as below and we want to split a single column into multiple columns using withcolumn and split functions of dataframe. An Azure Databricks database is a collection of tables. Tehcnically, we're really creating a second DataFrame with the correct names. tagged scala. I want to select specific row from a column of spark data frame. The below version uses the SQLContext approach. A dataframe is a distributed collection of data that is organized into rows, where each row consists of a set of columns, and each column has a name and an associated type. Output: Method #4: By using a dictionary We can use a Python dictionary to add a new column in pandas DataFrame. scala, spark, withColumn. Extracts a value or values from a complex type. The following example shows how to create a DataFrame by passing a list of dictionaries. Create Example DataFrame spark-shell --queue= *; To adjust logging level use sc. I can write a function something like. A schema provides informational detail such as the column name, the type of data in that column, and whether null or empty values are allowed in the column. The case class defines the schema of the table. These snippets show how to make a DataFrame from scratch, using a list of values. Converting RDD to Data frame with header in spark-scala Published on December 27, 2016 December 27, 2016 • 16 Likes • 6 Comments. Ensure the code does not create a large number of partition columns with the datasets otherwise the overhead of the metadata can cause significant slow downs. If the file type is CSV: In the Column Delimiter field, select whether to override the inferred delimiter. ORC format was introduced in Hive version 0. The following are top voted examples for showing how to use org. Different approaches to manually create Spark DataFrames object to create a DataFrame. Transpose data with Spark James Conner October 21, 2017 A short user defined function written in Scala which allows you to transpose a dataframe without performing aggregation functions. Spark DataFrames for large scale data science | Opensource. It provides high-level APIs in Java, Python, and Scala. The names of the arguments to the case class are read using reflection and they become the names of the columns. Spark; SPARK-10754; table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext. withColumn ("year", $ "year". Here's an easy example of how to rename all columns in an Apache Spark DataFrame. // IMPORT DEPENDENCIES import org. SparkContext. Spark will create a default local Hive metastore (using Derby) for you. Let’s see how can we do that. This is similar to what we have in SQL like MAX, MIN, SUM etc. dropoff seems to happen. Create a spark dataframe from sample data; Load spark dataframe into non existing hive table; How to add new column in Spark Dataframe; How to read JSON file in Spark; How to execute Scala script in Spark without creating Jar; Spark-Scala Quiz-1; Hive Quiz - 1; Join in hive with example; Trending now. In regular Scala code, it's best to use List or Seq, but Arrays are frequently used with Spark. The (Scala) examples below of reading in, and writing out a JSON dataset was done is Spark 1. The following code examples show how to use org. Example to Convert Dataframe to Matrix in R. Machine Learning Deep Learning Python Statistics Scala PostgreSQL Command Line Regular Create an example dataframe. Add file name as Spark DataFrame column. Let’s see how can we Apply uppercase to a column in Pandas dataframe. Pandas DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. In the File Type field, optionally override the inferred file type. Spark's new DataFrame API is inspired by data frames in R and Python (Pandas), but designed from the ground up to support modern big data and data science applications. An Azure Databricks database is a collection of tables. Blog has four sections: Spark read Text File Spark read CSV with schema/header Spark read JSON Spark read JDBC There are various methods to load a text file in Spark documentation. Is there any function in spark sql to do the same? Announcement! Career Guide 2019 is out now. Filtering a row in Spark DataFrame based on matching values from a list. To check if this is the case, we will first create a new boolean column, pickup_1st, based on the two datetime columns (creating new columns from existing ones in Spark dataframes is a frequently raised question - see Patrick's comment in our previous post); then, we will check in how many records this is false (i. Create Spark DataFrame From List[Any]. Drop data frame columns by name ; How to create correct data frame for classification in Spark ML. How to pivot the data to create multiple columns out of 1 column with multiple rows. Let us consider an example of employee records in a JSON file named employee. It is generally the most commonly used pandas object. SparkSession import org. In regular Scala code, it’s best to use List or Seq, but Arrays are frequently used with Spark. The explode() method explodes, or flattens, the cities array into a new column named "city". Spark dataframe split one column into multiple columns using split function April 23, 2018 adarsh 4d Comments Lets say we have dataset as below and we want to split a single column into multiple columns using withcolumn and split functions of dataframe. I would like to add another column to the dataframe by two columns, perform an operation on, and then report back the result into the new column (specifically, I have a column that is latitude and one that is longitude and I would like to convert those two to the Geotrellis Point class and. Ensure the code does not create a large number of partition columns with the datasets otherwise the overhead of the metadata can cause significant slow downs. Apache Spark. Here's an easy example of how to rename all columns in an Apache Spark DataFrame. This offers users a more flexible way to design beautiful map visualization effects including scatter plots and. Contribute to apache/spark development by creating an account on GitHub. Spark functions can be stored in objects. Add file name as Spark DataFrame column. It provides high-level APIs in Java, Python, and Scala. As per the SPARK API latest documentation def text(path: String): Unit Saves the content of the [code ]DataFrame[/code] in a text file at the specified path. This is similar to what we have in SQL like MAX, MIN, SUM etc. The factors include age, number of miscarriages, etc. Scala Spark DataFrame : dataFrame. Let's discuss all possible ways to rename column with Scala examples. _ statement can only be run. When performing joins in Spark, one question keeps coming up: When joining multiple dataframes, how do you prevent ambiguous column name errors? 1) Let's start off by preparing a couple of simple example dataframes // Create first example dataframe val firstDF = spark. You can copy paste the code in Jupyter Notebook with Scala-Toree Kernel or to your favorite IDE with Scala and… Continue reading. The rowkey also has to be defined in detail as a named column (rowkey), which has a specific column family cf of rowkey. Spark DataFrame can further be viewed as Dataset organized in named columns and presents as an equivalent relational table that you can use SQL-like query or even HQL. the Scala code most similar to R that I can achieve :. Column // Create an example dataframe. In this tutorial we will present Koalas, a new open source project that we announced at the Spark + AI Summit in April. You can vote up the examples you like and your votes will be used in our system to product more good examples. Questions: I come from pandas background and am used to reading data from CSV files into a dataframe and then simply changing the column names to something useful using the simple command: df. I am working on the Movie Review Analysis project with spark dataframe using scala. 0, Spark SQL is now de facto the primary and feature-rich interface to Spark's underlying in-memory…. cannot construct expressions). Output: Method #4: By using a dictionary We can use a Python dictionary to add a new column in pandas DataFrame. as simply changes the view of the data that is passed into typed operations (e. With the recent changes in Spark 2. apply(DataFrame. 6 the Project Tungsten was introduced, an initiative which seeks to improve the performance and scalability of Spark. // Create another DataFrame in a new partition directory, // Specifying. See GroupedData for all the available aggregate functions. How do I create new csv files with many fileds using spak udf function. tail to select the whole values mentioned in the List(). baahu November 26, 2016 No Comments on SPARK :Add a new column to a DataFrame using UDF and withColumn() Tweet In this post I am going to describe with example code as to how we can add a new column to an existing DataFrame using withColumn() function of DataFrame. HOT QUESTIONS. The different type of Spark functions (custom transformations, column functions, UDFs) columns or rows from a DataFrame; Use Column functions when you need a custom Spark SQL function that can. In Spark , you can perform aggregate operations on dataframe. HiveContext(sc) var hadoopFileDataFrame =hiveContext. To retrieve the column names, in both cases we can just type df. How to create DataFrame from Scala's List of Iterables? MTT was How to create spark dataframe from a scala list for a 2d list for which this is a correct answer. Specific condition is not to iterate over the column names. The concept is effectively the same as a table in a relational database or a data frame in R/Python, but with a set of implicit optimizations. What we are going to build in this first tutorial. But JSON can get messy and parsing it can get tricky. withColumn(col_name,col_expression) for adding a column with a specified expression. GitHub Gist: instantly share code, notes, and snippets. Note that you need to import org. groupby (colname). newAPIHadoopRDD is the API available in Spark to create RDD on hbase, configurations need to passed as shown. More than a year later, Spark's DataFrame API provides a rich set of operations for data munging, SQL queries, and analytics. Spark DataFrames are also compatible with R's built-in data frame support. Though we have covered most of the examples in Scala here, the same concept can be used to create DataFrame in PySpark (Python Spark). Groups the DataFrame using the specified columns, so we can run aggregation on them. With spark. The new Spark DataFrames API is designed to make big data processing on tabular data easier. com | Latest informal quiz & solutions at programming la. It provides high-level APIs in Java, Python, and Scala. spark scala create column from Dataframe with values dependent on date time range at AllInOneScript. 0·assign values to column·dataframe map using column names DataFrame: Append a column to the dataframe and insert respective file name into that column 0 Answers. This extended functionality includes motif finding, DataFrame. get specific row from spark dataframe apache-spark apache-spark-sql Is there any alternative for df[100, c("column")] in scala spark data frames. If you are working with Spark, you will most likely have to write transforms on dataframes. Spark; SPARK-10754; table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext. scala columns Dropping a nested column from Spark DataFrame How to change column types in Spark SQL's DataFrame? How to create correct data frame for. 0, Spark SQL is now de facto the primary and feature-rich interface to Spark's underlying in-memory…. columns = new_column_name_list However, the same doesn't work in pyspark dataframes created using sqlContext. Note that you need to import org. I need to append multiple columns to the existing spark dataframe where column names are given in List assuming values for new columns are constant, for example given input columns and dataframe ar. column name from other. Apache Spark DataFrames - Scala API - Basics Hello Readers, In this post, I am going to show you various operations that you can perform on DataFrames using Scala API. StructType objects contain a list of StructField objects that define the name, type, and nullable flag for each column in a DataFrame. retainGroupColumns configuration property controls whether to retain columns used for aggregation or not (in RelationalGroupedDataset operators). Conceptually, it is equivalent to relational tables with good optimizati. DataFrames and Datasets. Spark supports columns that contain arrays of values. Proposal: If a column is added to a DataFrame with a column of the same name, then the new column should replace the old column. Use the following command to create SQLContext. This post will help you get started using Apache Spark DataFrames with Scala on the MapR Sandbox. We can create a DataFrame programmatically using the following three steps. Let's discuss different ways to create a DataFrame one by one. format("com. hyperopt spark cut list generator peak 2018 meme michael jackson 2019 smart player cctv free download velocity hockey mikrotik wireless bridge setup red camera series 51 chevy sedan delivery for sale sega saturn chd 3d schriften download root v20 h915 playa del carmen resorts one direction preferences another boy insults you gamo whisper mods diamond eye exhaust phone. All examples will be in Scala. For purely demonstrative purpose, let’s see how to create a column containing the product between Age and Fare. The Apache Spark DataFrame API provides a rich set of functions (select columns, filter, join, aggregate, and so on) that allow you to solve common data analysis problems efficiently. Tehcnically, we're really creating a second DataFrame with the correct names. Ways to create DataFrame in Apache Spark - DATAFRAME is the representation of a matrix but we can have columns of different datatypes or similar table with different rows and having different types of columns (values of each column will be same data type). In this post, I am going to discuss Apache Spark and how you can create simple but robust ETL pipelines in it. 10 limit on case class parameters)? 1 Answer What is the difference between DataFrame. We want to read the file in spark using Scala. A Spark DataFrame is a distributed collection of data organized into named columns that provides operations. Spark SQL CSV examples in Scala tutorial. scala> val sqlcontext = new org. When a map is passed, it creates two new columns one for key and one for value and each element in map split into the rows. When working with SparkR and R, it is very important to understand that there are two different data frames in question - R data. Other Data Sources In the Connector drop-down, select a data source type. Prerequisites: In order to work with RDD we need to create a SparkContext object. Create a spark dataframe from sample data; Load spark dataframe into non existing hive table; How to add new column in Spark Dataframe; How to read JSON file in Spark; How to execute Scala script in Spark without creating Jar; Spark-Scala Quiz-1; Hive Quiz – 1; Join in hive with example; Trending now. Databricks Runtime 5. spark, and must also pass in a table and zkUrl parameter to specify which table and server to persist the DataFrame to. How to rename multiple columns of Dataframe in Spark Scala? Create an entry point as SparkSession object as. In regular Scala code, it’s best to use List or Seq, but Arrays are frequently used with Spark. Tables are equivalent to Apache Spark DataFrames. The Scala interface for Spark SQL supports automatically converting an RDD containing case classes to a DataFrame. Let’s see how can we Apply uppercase to a column in Pandas dataframe. Optimized Row Columnar (ORC) file format is a highly efficient columnar format to store Hive data with more than 1,000 columns and improve performance. spark dataset api with examples – tutorial 20 November 8, 2017 adarsh Leave a comment A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. The caching functionality can be tuned using the setConf method in the. It is generally the most commonly used pandas object. The following are top voted examples for showing how to use org. newAPIHadoopRDD is the API available in Spark to create RDD on hbase, configurations need to passed as shown. withColumn("dt",column), is there a way to create a column base on value of existing column? Thanks. Scala supports extension methods through implicits which we will use in an example to extend Spark DataFrame with a method to save it in an Azure SQL table. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT. Note that you need to import org. scala columns Dropping a nested column from Spark DataFrame How to change column types in Spark SQL's DataFrame? How to create correct data frame for. But JSON can get messy and parsing it can get tricky. These examples are extracted from open source projects. Spark has moved to a dataframe API since version 2. agg (avg(colname)). Let us take an example Data frame as shown in the following :. Explore careers to become a Big Data Developer or Architect!. DataFrame lets you create multiple columns with the same name, which causes problems when you try to refer to columns by name. I often need to perform an inverse selection of columns in a dataframe, or exclude some columns from a query. Working with Spark ArrayType and MapType Columns. dropoff seems to happen. add column to dataframe r create dataframe in r with column names create empty dataframe in r data. On creating the Spark DataFrame against both non-orc table ( source ) and the orc table, we are unable to list out the column names in the ORC table : scala> val df. 0 release, there are 3 types of data abstractions which Spark officially provides now to use : RDD,DataFrame and DataSet. columns = new_column_name_list However, the same doesn't work in pyspark dataframes created using sqlContext. We can do in the below way: Say you have a dataframe named DF We can use below syntax: DF. g how to create DataFrame from an RDD, List, Seq, TXT, CSV, JSON, XML files, Database e. Spark functions class provides methods for many of the mathematical functions like statistical, trigonometrical, etc. Tehcnically, we're really creating a second DataFrame with the correct names. The below version uses the SQLContext approach. retainGroupColumns configuration property controls whether to retain columns used for aggregation or not (in RelationalGroupedDataset operators). In Spark , you can perform aggregate operations on dataframe. groupby (colname). With the recent changes in Spark 2. In my opinion, however, working with dataframes is easier than RDD most of the time. To create a constant column in a Spark dataframe, you can make use of the withColumn() method. satendrakumar / DataFrameWithFileName. GraphFrames is a package for Apache Spark that provides DataFrame-based graphs. withColumn ("year", $ "year". Dataframes are a very popular. cannot construct expressions). How to add multiple columns in a spark dataframe using SCALA. Hello Readers, In this post, I am going to show you how to create a DataFrame from a Collection of Tuples using Scala API. Anyone has any idea ? scala apache-spark dataframe apache-spark-sql | this question edited Jan 15 '16 at 1:38 zero323 104k 22 213 294 asked Jan 15 '16 at 1:00 Adurthi Ashwin Swarup 118 1 12 |. You will learn how Spark provides APIs to transform different data format into Data…. Underlying processing of dataframes is done by RDD’s , Below are the most used ways to create the dataframe. Imagine we would like to have a table with an id column describing a user and then two columns for the number of cats and dogs she has. Here’s how to create an array of numbers with Scala: val numbers = Array(1, 2, 3) Let’s create a DataFrame with an ArrayType column. This is a variant of groupBy that can only group by existing columns using column names (i. Here are a few examples of parsing nested data structures in JSON using Spark DataFrames (examples here done with Spark 1. This topic demonstrates a number of common Spark DataFrame functions using Scala. I often need to perform an inverse selection of columns in a dataframe, or exclude some columns from a query. Requirement. Let's create a SomethingWeird object that defines a vanilla Scala function, a Spark SQL function, and a custom DataFrame transformation. foldLeft can be used to eliminate all whitespace in multiple columns or…. frame in R is a list of vectors with equal length. The diagnosis (1=yes 0=no) is in column D with column heading FNDX. In simple terms, it is same as a table in relational database or an Excel sheet with Column headers. In the Create in Database field, optionally override the selected default database. Column = id Beside using the implicits conversions, you can create columns using col and column functions. by Rahul Mukherjee Last Updated October 26, 2018 12:26 PM Create a List with column name and values. field" ) // Extracting a struct field col( "`a. Conceptually, it is equivalent to relational tables with good optimizati. Create a DataFrame from List of Dicts. Not when you create them, but when you use them. This post will help you get started using Apache Spark DataFrames with Scala on the MapR Sandbox. When working with SparkR and R, it is very important to understand that there are two different data frames in question - R data. // Create a DataFrame from data sources val df = sqlContext. You can define a Dataset JVM objects and then manipulate them using functional transformations (map, flatMap, filter, and so on) similar to an RDD. spark spark sql spark streaming spark-sql spark 2. Create Spark DataFrame From List[Any]. And we can transform a. groupby (colname). Case class -EmpRow is used in order to give the structure to the dataframe. I have created a dataframe as below: val bankDF = About Us The Simplilearn community is a friendly, accessible place for professionals of all ages and backgrounds to engage in healthy, constructive debate and informative discussions. Let's create a DataFrame with a name column that isn't nullable and an age column that is nullable. apply(DataFrame. Scala offers lists, sequences, and arrays. Append Spark Dataframe with a new Column by UDF To change the schema of a data frame, we can operate on its RDD, then apply a new schema. spark-shell --queue= *; To adjust logging level use sc. cannot construct expressions). In Scala, DataFrame is now an alias representing a DataSet containing Row objects, where Row is a generic, untyped Java Virtual Machine (JVM) object. _ import org. The new Spark DataFrames API is designed to make big data processing on tabular data easier. Spark SQL can automatically capture the schema of a JSON dataset and load it as a DataFrame. Let’s see how can we do that. All examples will be in Scala. Pandas DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It also shares some common characteristics with RDD: Immutable in nature: We can create DataFrame / RDD once but can’t change it. Spark; SPARK-10754; table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext. Example to Convert Dataframe to Matrix in R. load("", "json") Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in: DataFrame (this class), Column, and functions. Using Spark SQL DataFrame we can create a temporary view. I have to handle the scenario in which I require handling the column names dynamically. Needlessly to say they are amazing. Spark DataFrames are also compatible with R's built-in data frame support. – msemelman Aug 10 '16 at 13:24 |. Though we have covered most of the examples in Scala here, the same concept can be used to create DataFrame in PySpark (Python Spark). The import spark. With spark. The following code examples show how to use org. Groups the DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions. Using Spark DataFrame withColumn – To rename nested columns. Lets see how to select multiple columns from a spark data frame. Spark functions class provides methods for many of the mathematical functions like statistical, trigonometrical, etc. foldLeft can be used to eliminate all whitespace in multiple columns or…. Spark DataFrames are also compatible with R's built-in data frame support. In regular Scala code, it's best to use List or Seq, but Arrays are frequently used with Spark. In Scala, DataFrame is now an alias representing a DataSet containing Row objects, where Row is a generic, untyped Java Virtual Machine (JVM) object. These examples are extracted from open source projects. In this article, we will learn different ways to use Spark SQL StructType schema on DataFrame with scala examples. Is there a simple way to select columns from a dataframe with a sequence of string? Something like. The first parameter "sum" is the name of the new column, the second parameter is the call to the UDF "addColumnUDF".