Skip to content

Query with _source filter fails in version 5.x #901

@sherry-ger

Description

@sherry-ger

What kind an issue is this?

  • Bug report/enhancement request.

Issue description

ES Hadoop connector version: 2.x

Query like the following works:

{
	"_source" : ["ID","Name"],
	"query": {
		"match_all": {}
	}
}

ES Hadoop connector version: 5.x

The above query no longer works although it is still a valid ES query.

Steps to reproduce

Code:

// load data
    val queryStr = "{ \"match_all\": {} }"
    val fieldsToLoad = Array("ID", "Name")
//    val qStr =
//      s"""
//        |{
//        |  "_source": [${fieldsToLoad.map(field=> "\"" + field + "\"").mkString(",")}],
//        |  "query": $queryStr
//        |}
//      """.stripMargin
    val qStr =
      s"""
         |{
         |  "_source": [${fieldsToLoad.map(field=> "\"" + field + "\"").mkString(",")}]
         |}
      """.stripMargin
    val loadedRDD = sc.esJsonRDD(s"${prefixIndexName}_*/taxis", qStr, esProps)
    assert(loadedRDD != null)
    assert(loadedRDD.count() == 3)
    loadedRDD.foreach( pair => {
      val esID = pair._1
      assert(esID != null)
 
      val recordJson = pair._2
      assert(recordJson.contains("\"ID\""))
      assert(recordJson.contains("\"Name\""))
      assert(!recordJson.contains("\"StartDate\""))
    })

Strack trace:
None

Explanation:

The _source clause is used to limit the fields being returned to optimize the payload.

Version Info

OS: :
JVM :
Hadoop/Spark:
ES-Hadoop : 5.0.1
ES : 5.0.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions