-
Notifications
You must be signed in to change notification settings - Fork 723
Benchmark script for base64 ingest #3087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
67b2ac0 to
a4218d1
Compare
c463e2d to
7712821
Compare
7712821 to
6448edb
Compare
|
|
||
| const start = Date.now() | ||
|
|
||
| const lines = raw.split(/[\r\n]+/).filter(row => row.trim().length > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parsing of dataset should be moved before the start
corpus deserialization and file IO should not be part of the benchmark result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a question of whether splitting a string into an array of strings is considered deserialization or serialization. 😆
I download, unzip and read the file into a string before starting to measure the time, but this feels like a gray area that we should clear up. Opened up a question about it in the Google doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea also l120 for JSON.parse() would count as deserialization.
| const serializer = new Serializer() | ||
| let chunk = [] | ||
|
|
||
| await client.indices.create(indexSettings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should add a refresh call to ensure the index is fully created before starting the ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 63277f6
|
|
||
| const duration = Date.now() - start | ||
|
|
||
| await client.indices.delete({ index: indexName }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index is deleted but there's no check/delete at the start of each pass before client.indices.create(). If a previous run crashed, the index might already exist...to be extra safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 47044ff.
| base64Duration.push(await index(chunk_size, true)) | ||
| } | ||
|
|
||
| measurement.float32 = { duration: float32Duration.reduce((a, b) => a + b, 0) / float32Duration.length } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think we need documents/second metric instead as int he doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asked a clarifying question about that in the doc.
| const client = new Client({ | ||
| node: process.env.ES_URL, | ||
| auth: { | ||
| username: process.env.ES_USERNAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm just when we're testing to use http instead of https as suggested in the doc
|
Will wait to get another review once questions get answered in the benchmark methodology document. |
Uses work from elastic/elastic-transport-js#337. Part of #3080.
Should be run with env vars
ES_URL,ES_USERNAME, andES_PASSWORD. Example: