Skip to content

Conversation

@JoshMock
Copy link
Member

@JoshMock JoshMock commented Dec 15, 2025

Uses work from elastic/elastic-transport-js#337. Part of #3080.

Should be run with env vars ES_URL, ES_USERNAME, and ES_PASSWORD. Example:

npm install
env ES_URL=http://localhost:9200 \
  ES_USERNAME=elastic \
  ES_PASSWORD=changeme \
  node test/ingest/base64.mjs

@prodsecmachine
Copy link

prodsecmachine commented Dec 15, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@JoshMock JoshMock requested a review from margaretjgu December 15, 2025 22:26
@JoshMock JoshMock marked this pull request as ready for review December 15, 2025 22:27

const start = Date.now()

const lines = raw.split(/[\r\n]+/).filter(row => row.trim().length > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsing of dataset should be moved before the start

corpus deserialization and file IO should not be part of the benchmark result

Copy link
Member Author

@JoshMock JoshMock Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a question of whether splitting a string into an array of strings is considered deserialization or serialization. 😆

I download, unzip and read the file into a string before starting to measure the time, but this feels like a gray area that we should clear up. Opened up a question about it in the Google doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea also l120 for JSON.parse() would count as deserialization.

const serializer = new Serializer()
let chunk = []

await client.indices.create(indexSettings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should add a refresh call to ensure the index is fully created before starting the ingestion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 63277f6


const duration = Date.now() - start

await client.indices.delete({ index: indexName })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index is deleted but there's no check/delete at the start of each pass before client.indices.create(). If a previous run crashed, the index might already exist...to be extra safe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 47044ff.

base64Duration.push(await index(chunk_size, true))
}

measurement.float32 = { duration: float32Duration.reduce((a, b) => a + b, 0) / float32Duration.length }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we need documents/second metric instead as int he doc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked a clarifying question about that in the doc.

const client = new Client({
node: process.env.ES_URL,
auth: {
username: process.env.ES_USERNAME,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just when we're testing to use http instead of https as suggested in the doc

@JoshMock
Copy link
Member Author

Will wait to get another review once questions get answered in the benchmark methodology document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants