-
Notifications
You must be signed in to change notification settings - Fork 998
Closed
Labels
Description
What kind an issue is this?
- Bug report/enhancement request.
Issue description
ES Hadoop connector version: 2.x
Query like the following works:
{
"_source" : ["ID","Name"],
"query": {
"match_all": {}
}
}
ES Hadoop connector version: 5.x
The above query no longer works although it is still a valid ES query.
Steps to reproduce
Code:
// load data
val queryStr = "{ \"match_all\": {} }"
val fieldsToLoad = Array("ID", "Name")
// val qStr =
// s"""
// |{
// | "_source": [${fieldsToLoad.map(field=> "\"" + field + "\"").mkString(",")}],
// | "query": $queryStr
// |}
// """.stripMargin
val qStr =
s"""
|{
| "_source": [${fieldsToLoad.map(field=> "\"" + field + "\"").mkString(",")}]
|}
""".stripMargin
val loadedRDD = sc.esJsonRDD(s"${prefixIndexName}_*/taxis", qStr, esProps)
assert(loadedRDD != null)
assert(loadedRDD.count() == 3)
loadedRDD.foreach( pair => {
val esID = pair._1
assert(esID != null)
val recordJson = pair._2
assert(recordJson.contains("\"ID\""))
assert(recordJson.contains("\"Name\""))
assert(!recordJson.contains("\"StartDate\""))
})
Strack trace:
None
Explanation:
The _source clause is used to limit the fields being returned to optimize the payload.
Version Info
OS: :
JVM :
Hadoop/Spark:
ES-Hadoop : 5.0.1
ES : 5.0.1
rtrujill007 and randallwhitman