about summary refs log tree commit diff
path: root/tvix/store/src/import.rs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-04-20 r/7981 refactor(tvix/castore/import): make module, split off fs and errorFlorian Klink1-3/+3
Move error types and filesystem-specific functions to a separate file, and keep the fs:: namespace in public exports. Change-Id: I5e9e83ad78d9aea38553fafc293d3e4f8c31a8c1 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11486 Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com> Autosubmit: flokli <flokli@flokli.de>
2024-04-19 r/7979 refactor(tvix/castore): generalize store ingestion streamsConnor Brewster1-2/+2
Previously the store ingestion code was coupled to `walkdir::DirEntry`s produced by the `walkdir` crate which made it impossible to reuse ingesting from other sources like tarballs or NARs. This introduces a `IngestionEntry` which carries enough information for store ingestion and a future for computing the Blake3 digest of files. This allows the producer to perform file uploads in a way that makes sense for the source, ie. the filesystem upload could concurrently upload multiple files at the same time, while the NAR ingestor will need to ingest the entire blob before yielding the next blob in the stream. In the future we can buffer small blobs and upload them concurrently, but the full blob still needs to be read from the NAR before advancing. Change-Id: I6d144063e2ba5b05e765bac1f27d41b3c8e7b283 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11462 Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2024-04-19 r/7965 chore(tvix/store): migrate import.rs and tests/pathinfo.rs to rstestFlorian Klink1-9/+10
Also, rename the DUMMY_NAME constant in the fixtures to DUMMY_PATH, which aligns more with the ToString representation and from_bytes conversions we have on StorePath[Ref]. Change-Id: I39763c9dfa84c5d86f2fd0171b3a4d36fd72f267 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11464 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-04-01 r/7839 refactor(tvix/store): generalize `PathInfo` constructorsRyan Lahfa1-9/+32
Instead of enforcing NAR SHA256 all the time, we generalize the `PathInfo` constructor to take a `CAHash` argument which can drive whether we are having a flat, NAR or text scheme. With this, it is now possible to implement flat schemes in our evaluation builtins, e.g. `builtins.path`. Change-Id: I15bfee0ef4f0f428bfbd2f30c57c012cdcf6a976 Signed-off-by: Ryan Lahfa <tvl@lahfa.xyz> Reviewed-on: https://cl.tvl.fyi/c/depot/+/11286 Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2024-03-18 r/7733 refactor(tvix/store/import): use B3Digest in log_nodeFlorian Klink1-5/+3
Change-Id: I2347bbae8e7d4e19eeed4a3fb13729d0a94feedd Reviewed-on: https://cl.tvl.fyi/c/depot/+/11195 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-03-15 r/7700 docs(tvix): fix some docstringsFlorian Klink1-1/+1
Change-Id: Ife599387d0472cd746b992bd6755a2fb6a0e0dc4 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11158 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI
2024-01-20 r/7433 feat(tvix/store): enable `name` customization in the storeRyan Lahfa1-5/+5
Sometimes, Nix lets someone customize the `name` in the store for a path, this is the case for `builtins.path` which takes a `name` argument, we leave it to the caller to choose the name, which can be the basename by default of the path. Change-Id: Icdbf71d1d8f2dca5716b99d20aac885aab905b80 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10653 Tested-by: BuildkiteCI Autosubmit: raitobezarius <tvl@lahfa.xyz> Reviewed-by: flokli <flokli@flokli.de>
2024-01-20 r/7432 refactor(tvix/store): `import_path` → `import_path_as_nar_ca`Ryan Lahfa1-0/+156
Add multiple additional helpers such as: - `path_to_name`: derive the basename of a given path - `derive_nar_ca_path_info`: derive the `PathInfo` for a content addressed NAR which isolates further the tree walking feature and the ingestion feature. Additionally, we don't `expect` anymore and propagate properly ingestion errors up. Change-Id: I60edb5b633911c58ade7e19f5002e6f75f90e262 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10574 Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Autosubmit: raitobezarius <tvl@lahfa.xyz>
2023-09-22 r/6629 refactor(tvix): move castore into tvix-castore crateFlorian Klink1-199/+0
This splits the pure content-addressed layers from tvix-store into a `castore` crate, and only leaves PathInfo related things, as well as the CLI entrypoint in the tvix-store crate. Notable changes: - `fixtures` and `utils` had to be moved out of the `test` cfg, so they can be imported from tvix-store. - Some ad-hoc fixtures in the test were moved to proper fixtures in the same step. - The protos are now created by a (more static) recipe in the protos/ directory. The (now two) golang targets are commented out, as it's not possible to update them properly in the same CL. This will be done by a followup CL once this is merged (and whitby deployed) Bug: https://b.tvl.fyi/issues/301 Change-Id: I8d675d4bf1fb697eb7d479747c1b1e3635718107 Reviewed-on: https://cl.tvl.fyi/c/depot/+/9370 Reviewed-by: tazjin <tazjin@tvl.su> Reviewed-by: flokli <flokli@flokli.de> Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
2023-09-21 r/6623 refactor(tvix/store): Asyncify PathInfoService and DirectoryServiceConnor Brewster1-0/+1
We've decided to asyncify all of the services to reduce some of the pains going back and for between sync<->async. The end goal will be for all the tvix-store internals to be async and then expose a sync interface for things like tvix eval io. Change-Id: I97c71f8db1d05a38bd8f625df5087d565705d52d Reviewed-on: https://cl.tvl.fyi/c/depot/+/9369 Autosubmit: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2023-09-18 r/6606 refactor(tvix/store/blobsvc): make BlobStore asyncFlorian Klink1-9/+9
We previously kept the trait of a BlobService sync. This however had some annoying consequences: - It became more and more complicated to track when we're in a context with an async runtime in the context or not, producing bugs like https://b.tvl.fyi/issues/304 - The sync trait shielded away async clients from async worloads, requiring manual block_on code inside the gRPC client code, and spawn_blocking calls in consumers of the trait, even if they were async (like the gRPC server) - We had to write our own custom glue code (SyncReadIntoAsyncRead) to convert a sync io::Read into a tokio::io::AsyncRead, which already existed in tokio internally, but upstream ia hesitant to expose. This now makes the BlobService trait async (via the async_trait macro, like we already do in various gRPC parts), and replaces the sync readers and writers with their async counterparts. Tests interacting with a BlobService now need to have an async runtime available, the easiest way for this is to mark the test functions with the tokio::test macro, allowing us to directly .await in the test function. In places where we don't have an async runtime available from context (like tvix-cli), we can pass one down explicitly. Now that we don't provide a sync interface anymore, the (sync) FUSE library now holds a pointer to a tokio runtime handle, and needs to at least have 2 threads available when talking to a blob service (which is why some of the tests now use the multi_thread flavor). The FUSE tests got a bit more verbose, as we couldn't use the setup_and_mount function accepting a callback anymore. We can hopefully move some of the test fixture setup to rstest in the future to make this less repetitive. Co-Authored-By: Connor Brewster <cbrewster@hey.com> Change-Id: Ia0501b606e32c852d0108de9c9016b21c94a3c05 Reviewed-on: https://cl.tvl.fyi/c/depot/+/9329 Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-09-05 r/6553 chore(tvix/store): drop walkdir workaround for symlinks at rootFlorian Klink1-20/+4
https://github.com/BurntSushi/walkdir/pull/170 got merged, meaning we don't need to keep our own logic in here anymore. Our test cases already cover this. Change-Id: Ied3043ee651c8aafa10271c1e1ca5d460fb6c0b8 Reviewed-on: https://cl.tvl.fyi/c/depot/+/9269 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
2023-09-03 r/6548 docs(tvix/store): address cargo doc warningsFlorian Klink1-3/+4
Fix some broken link references. Change-Id: I69c9b2b62af35bb777e4df1a01ba3181a368be47 Reviewed-on: https://cl.tvl.fyi/c/depot/+/9214 Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2023-07-22 r/6439 feat(tvix/store/proto): use Bytes instead of Vec<u8>Florian Klink1-16/+19
Makes use of https://github.com/tokio-rs/prost/pull/341, which makes our bytes field cheaper to clone. It's a bit annoying to configure due to https://github.com/hyperium/tonic/issues/908, but the workaround does get the job done. Change-Id: I25714600b041bb5432d3adf5859b151e72b12778 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8975 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: flokli <flokli@flokli.de>
2023-07-21 r/6436 refactor(tvix/store): use bytes for node names and symlink targetsFlorian Klink1-26/+8
Some paths might use names that are not valid UTF-8. We should be able to represent them. We don't actually need to touch the PathInfo structures, as they need to represent StorePaths, which come with their own harder restrictions, which can't encode non-UTF8 data. While this doesn't change any of the wire format of the gRPC messages, it does however change the interface of tvix_eval::EvalIO - its read_dir() method does now return a list of Vec<u8>, rather than SmolStr. Maybe this should be OsString instead? Change-Id: I821016d9a58ec441ee081b0b9f01c9240723af0b Reviewed-on: https://cl.tvl.fyi/c/depot/+/8974 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: raitobezarius <tvl@lahfa.xyz> Tested-by: BuildkiteCI
2023-06-12 r/6278 refactor(tvix/store/blobsvc): drop Result<_,_> around open_writeFlorian Klink1-1/+1
We never returned Err here anyways, and we can still return an error during the first (or subsequent) write(s). Change-Id: I4b4cd3d35f6ea008e9ffe2f7b71bfc9187309e2f Reviewed-on: https://cl.tvl.fyi/c/depot/+/8750 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
2023-06-12 r/6273 refactor(tvix/store): use Arc instead of BoxFlorian Klink1-4/+10
This allows us to blob services without closing them before putting them in a box. We currently need to use Arc<_>, not Rc<_>, because the GRPC wrappers require Sync. Change-Id: I679c5f06b62304f5b0456cfefe25a0a881de7c84 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8738 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de>
2023-06-12 r/6272 refactor(tvix/store): use Box<dyn DirectoryService>Florian Klink1-4/+5
Once we support configuring services at runtime, we don't know what DirectoryService we're using at compile time. This also means, we can't explicitly use the is_closed method from GRPCPutter, without making it part of the DirectoryPutter itself. Change-Id: Icd2a1ec4fc5649a6cd15c9cc7db4c2b473630431 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8727 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-06-12 r/6269 feat(tvix/store): eliminate generics in BlobStoreFlorian Klink1-6/+7
To construct various stores at runtime, we need to eliminate associated types from the BlobService trait, and return Box<dyn …> instead of specific types. This also means we can't consume self in the close() method, so everything we write to is put in an Option<>, and during the first close we take from there. Change-Id: Ia523b6ab2f2a5276f51cb5d17e81a5925bce69b6 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8647 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
2023-05-25 r/6200 refactor(tvix/store): drop mut self borrow in ingest_pathFlorian Klink1-3/+3
With traverse_to not requiring a &mut anymore, we can drop the &mut self in all these function signatures. Change-Id: I22105376b625cb281c39e92d3206df8a6ce97a88 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8629 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: flokli <flokli@flokli.de>
2023-05-17 r/6149 refactor(tvix/store): rename import::{import_path -> ingest_path}Florian Klink1-6/+7
This distinguishes it better from the EvalIO::import_path method. Also update the docstring to explain what it does (and what it doesn't). Change-Id: I32a8b2869fa67a894df28532b22bf170961a2abf Reviewed-on: https://cl.tvl.fyi/c/depot/+/8578 Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2023-05-11 r/6133 refactor(tvix/store): remove ChunkServiceFlorian Klink1-38/+16
Whether chunking is involved or not, is an implementation detail of each Blobstore. Consumers of a whole blob shouldn't need to worry about that. It currently is not visible in the gRPC interface either. It shouldn't bleed into everything. Let the BlobService trait provide `open_read` and `open_write` methods, which return handles providing io::Read or io::Write, and leave the details up to the implementation. This means, our custom BlobReader module can go away, and all the chunking bits in there, too. In the future, we might still want to add more chunking-aware syncing, but as a syncing strategy some stores can expose, not as a fundamental protocol component. This currently needs "SyncReadIntoAsyncRead", taken and vendored in from https://github.com/tokio-rs/tokio/pull/5669. It provides a AsyncRead for a sync Read, which is necessary to connect our (sync) BlobReader interface to a GRPC server implementation. As an alternative, we could also make the BlobReader itself async, and let consumers of the trait (EvalIO) deal with the async-ness, but this is less of a change for now. In terms of vendoring, I initially tried to move our tokio crate to these commits, but ended up in version incompatibilities, so let's vendor it in for now. Change-Id: I5969ebbc4c0e1ceece47981be3b9e7cfb3f59ad0 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8551 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
2023-04-07 r/6074 refactor(tvix/store/import): use DirectoryPutter in import.rsFlorian Klink1-5/+7
This should allow import_path to communicate to a gRPC remote store, that actually verifies the Directory nodes are interconnected. Change-Id: Ic5d28c33518f50dedec15f1732d81579a3afaff1 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8357 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2023-03-16 r/6014 refactor(tvix/store/directorysvc): use [u8; 32] instead of Vec<u8>Florian Klink1-1/+1
Also, simplify the trait interface, only allowing lookups of Directory objects by their digest. Change-Id: I6eec28a8cb0557bed9b69df8b8ff99a5e0f8fe35 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8313 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su>
2023-03-13 r/5958 refactor(tvix/store): add read_all_and_chunk methodFlorian Klink1-41/+4
This moves the logic from src/import.rs that - reads over the contents of a file - chunks them up and uploads individual chunks - keeps track of the uploaded chunks in a BlobMeta structure - returns the hash of the blob and the BlobMeta structure … into a generic read_all_and_chunk function in src/chunkservice/util.rs. It will work on anything implementing io::Read, not just files, which will help us in a bit. Change-Id: I53bf628114b73ee2e515bdae29974571ea2b6f6f Reviewed-on: https://cl.tvl.fyi/c/depot/+/8259 Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz> Autosubmit: flokli <flokli@flokli.de>
2023-03-11 r/5953 refactor(tvix/store): bump fastcdc depFlorian Klink1-1/+1
This removes the use of Box::new, switching fastcdc to version 3.0.2 with https://github.com/nlfiedler/fastcdc-rs/issues/25 fixed. Change-Id: I64f388b9e0a7f358e25a8bb7ca0e4df1d3bb01c4 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8249 Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: flokli <flokli@flokli.de>
2023-03-11 r/5952 refactor(tvix/store): factor out hash update into functionFlorian Klink1-7/+6
We're using this in a bunch of places. Let's move it into a helper function. Change-Id: I118fba35f6d343704520ba37280e4ca52a61da44 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8251 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-03-11 r/5950 feat(tvix/store/import): use StreamCDC instead of blobwriterFlorian Klink1-9/+35
This seems to be way faster. Change-Id: Ica7cee95d108c51fe67365f07366634ddbbfa060 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8246 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2023-03-10 r/5939 feat(tvix/store): use rayon to upload chunks concurrentlyFlorian Klink1-2/+2
Look at the data that's written to us, and upload all chunks but the rest in parallel, using rayon. This required moving `upload_chunk` outside the struct, and accepting a ChunkService to use for upload (which it was previously getting from `self.chunk_service`). This doesn't speed up things too much for now, because things are still mostly linear. Change-Id: Id785b5705c3392214d2da1a5b6a182bcf5048c8d Reviewed-on: https://cl.tvl.fyi/c/depot/+/8195 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-03-10 r/5938 feat(tvix/store/import): make sure entries are sortedFlorian Klink1-2/+5
The Directory service does already reject inserting invalid (wrongly sorted) Directory messages, but our test case didn't provoke it. Change-Id: I228e201925e8999186659a2d8da0118db184d9ab Reviewed-on: https://cl.tvl.fyi/c/depot/+/8167 Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2023-03-10 r/5932 feat(tvix/store): add import::import_pathFlorian Klink1-0/+248
This imports the contents at a given Path into the tvix store. It doesn't register the contents at a Path in the store itself, that's up to the PathInfoService. Change-Id: I2c493532d65b90f199ddb7dfc90249f5c2957dee Reviewed-on: https://cl.tvl.fyi/c/depot/+/8159 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Tested-by: BuildkiteCI