I am trying to run below piece of spark code on pyspark and getting error. Could you please help me to understand what is missing?
p1 = pd.DataFrame(final_data,columns = ['Year','Name','Sex','Count']) h1 = sqlContext.createDataFrame(p1) h1.registerTempTable('namesdb') sqlContext.sql("select SUBSTR(Name, 1, 1) as char1, count(Name) FROM namesdb group by char1 order by char1 ASC").toPandas()
But I am getting below error :
AnalysisException: u"cannot resolve 'char1' given input columns: [Year, Name, Sex, Count];
Here are the sample records for final_data
final_data[:2] [[1880, 'Mary', 'F', '7065'], [1880, 'Anna', 'F', '2604']
In SQL you cannot use the assigned column name 'as char1' in the group by clause, but you can just repeat the function in your group by clause like this:
select SUBSTR(Name, 1, 1) as char1, count(Name) FROM namesdb group by SUBSTR(NAME,1,1) order by char1 ASC
Answer author Camaris
Tickanswer.com is providing the only single recommended solution of the question Spark- how to count names by first character of the name from spark registered table under the categories i.e apache-spark , pyspark , spark-dataframe , . Our team of experts filter the best solution for you.